*3.2. Criteria*

This study compares various independent and dependent software reliability models and the proposed model introduced Table 1 using 11 criteria. Based on the difference between the actual observed value and the estimated value, we would like to find a better model by comparing it with criteria reflecting the number of parameters used in each model.

First, the mean squared error (MSE) is defined as the sum of squares of the distance between the estimated value and the actual value when considering the number of parameters and the number of observations [37].

$$\text{MSE} = \frac{\sum\_{i=1}^{n} (\hat{m}(t\_i) - y\_i)^2}{n - m} \tag{11}$$

where *m*<sup>6</sup>(*ti*) is the estimated value of the model *<sup>m</sup>*(*t*), *yi* is the actual observed value, *n* is the number of observations, and *m* is the number of parameters in each model.

Second, the mean absolute error (MAE) defines the difference between the estimated number of failures and the actual value considering the number of parameters and the number of observations as the sum of the absolute values [38].

$$\text{MAE} = \frac{\sum\_{i=1}^{n} |\hat{m}(t\_i) - y\_i|}{n - m} \tag{12}$$

Third, Adj\_R<sup>2</sup> is the modified coefficient of determination of the regression equation and determines how much explanatory power it has in consideration of the number of parameters [39].

$$\mathbf{R}^2 = 1 - \frac{\sum\_{i=1}^n \left(\hat{m}(t\_i) - y\_i\right)^2}{\sum\_{i=1}^n \left(y\_i - \overline{y\_i}\right)^2}, \text{Adj}\\_\mathbf{R}^2 = 1 - \frac{\left(1 - R^2\right)(n - 1)}{n - m - 1} \tag{13}$$

Fourth, the predictive ratio risk (PRR) is obtained by dividing the distance from the actual value to the estimated value by the estimated value in relation to the model estimation [40].

$$\text{PRR} = \sum\_{i=1}^{n} \left( \frac{\hat{m}(t\_i) - y\_i}{\hat{m}(t\_i)} \right)^2 \tag{14}$$

Fifth, the predictive power (PP) is obtained by dividing the distance from the actual value to the estimated value by the actual value [41].

$$\text{PP} = \sum\_{i=1}^{n} \left( \frac{\hat{m}(t\_i) - y\_i}{y\_i} \right)^2 \tag{15}$$

Sixth, Akaike's information criterion (AIC) was used to compare likelihood function maximization. This is applied to maximize the Kullback–Leibler level between the probability distribution of the model and the data [42].

$$\begin{aligned} \text{AIC} &= -2\log L + 2m\\ L &= \prod\_{i=1}^{n} \frac{(m(t\_i) - m(t\_{i-1}))^{y\_i - y\_{i-1}}}{(y\_i - y\_{i-1})!} e^{-(m(t\_i) - m(t\_{i-1}))}\\ \log L &= \sum\_{i=1}^{n} \left\{ (y\_i - y\_{i-1})\ln(m(t\_i) - m(t\_{i-1})) - (m(t\_i) - m(t\_{i-1})) \right. \\ &\qquad - \ln((y\_i - y\_{i-1})!) \right\} \end{aligned} \tag{16}$$

Seventh, the predicted relative variation (PRV) is the standard deviation of the prediction bias and is defined as [43]

$$\text{PRV} = \sqrt{\frac{\sum\_{i=1}^{n} (y\_i - \hat{m}(t\_i) - Bias)^2}{n-1}} \tag{17}$$

Here, the bias is *n* ∑ *i*=1 *<sup>m</sup>*<sup>6</sup>(*ti*)−*yi n* .

The root mean square prediction error (RMSPE) can estimate the closeness with which the model predicts the observation [44]:

$$\text{RMSPE} = \sqrt{Variance^2 + Bias^2} \tag{18}$$

Ninth, the mean error of prediction (MEOP) sums the absolute value of the deviation between the actual data and the estimated curve and is defined as [38]

$$\text{MEOP} = \frac{\sum\_{i=1}^{n} |\widehat{m}(t\_i) - y\_i|}{n - m + 1} \tag{19}$$

Tenth, the Theil statistic (TS) is the average percentage of deviation over all periods with regard to the actual values. The closer the Theil statistic is to zero, the better the prediction capability of the model. This is defined as [45]

$$\text{TS} = 100 \ast \sqrt{\frac{\sum\_{i=1}^{n} (y\_i - \hat{m}(t\_i))^2}{\sum\_{i=1}^{n} y\_i^2}} \% \tag{20}$$

Eleventh, it takes into account the tradeoff between the uncertainty in the model and the number of parameters in the model by slightly increasing the penalty each time parameters are added to the model when the sample is considerably small [46].

$$\text{PC} = \left(\frac{n-m}{2}\right) \log\left(\frac{\sum\_{i=1}^{n} (\widehat{m}(t\_i) - y\_i)^2}{n}\right) + m \left(\frac{n-1}{n-m}\right) \tag{21}$$

Based on the above criteria, we compared the proposed model with the existing NHPP software reliability model. When Adj\_R<sup>2</sup> is closer to 1, and the other 10 criteria are closer to 0, it indicates a better fit. Using R and MATLAB, the parameters of each model were estimated through the LSE method, and the goodness of fit is calculated to compare the superiority. This is a method of estimating parameters through the difference between the model in Table 1 and the actual number of failures in Table 2, and follows LSE = ∑*nt*=<sup>1</sup>(*yt* − *m*(*t*))<sup>2</sup> [47].

#### *3.3. Results of Dataset 1*

Table 3 shows the estimated values for the parameters of each model obtained using dataset 1. Each parameter of the proposed model is represented by *a*ˆ = 80.0907, ˆ *b* = 0.07231, *c* ˆ = 15.9288, and ˆ *h* = 9.8182. Figure 2 shows the result of calculating the estimated value of *m*(*t*) at each time point based on the cumulative number of failures at each time point in dataset 1 and each model equation. The black dotted line represents the actual data, and the dark red solid line represents the predicted failure value at each time point of the proposed model. Compared with other models, it shows the predicted value closest to the actual value.


*c*

ˆ

> =

ˆ

*h* = 9.8182

15.9288,

**Table 3.** Parameter estimation of model from dataset 1.

 Proposed model

Table 4 shows the results of calculating the criteria of each model using the parameters obtained through dataset 1. As a result, the values of MSE, MAE, PRR, PP, PRV, RMSPE, MEOP, TS, PC of the proposed model show the smallest values of 9.9274, 3.2140, 0.0647, 0.0577, 2.6866, 2.6870, 2.8569, 4.6682, and 13.0594, respectively. The AIC shows the second smallest value at 73.4605. In addition, Adj\_R<sup>2</sup> is 0.9821, which is the closest to 1. The DPF model shows the highest value with AIC = 73.2850, and the second highest result for the other criteria. The model with the third highest criterion is the Vtub model.

**Table 4.** Comparison of all criteria from dataset 1.


#### *3.4. Results of Dataset 2*

Table 5 shows the estimated values for the parameters of each model obtained using dataset 2. Each parameter of the proposed model is represented as *a*ˆ = 79.1444, ˆ *b* = 0.2001, *c*ˆ = 72.3208, and ˆ *h* = 9.3327. Figure 3 shows the results of calculating the estimated value of *m*(*t*) for each point in time based on the cumulative number of failures at each time point in dataset 2 and each model equation. Here, the black dotted line represents the actual data, whereas the dark red solid line represents the predicted failure value at each time point of the proposed model. Compared with the other models, the predicted value is closest to the actual value.

Table 6 shows the results of calculating the criteria of each model using the parameters obtained through dataset 2. As a result, the values of MSE, MAE, PRR, PP, AIC, PRV, RMSPE, MEOP, TS, and PC of the proposed model are 18.9722, 4.3544, 0.1615, 0.1482, 92.2155, 3.7139, 3.7145, 3.8706, 6.3751, and 15.6500, respectively, which show the smallest criteria. Adj\_R<sup>2</sup> is 0.9723, which is the closest to 1. The model with the second highest criterion is DPF, and Vtub is the third best-fitting model.


**Figure 3.** Prediction of all models for dataset 2.

**Table 6.** Comparison of all criteria from dataset 2.


*3.5. Results of Dataset 3*

Table 7 shows the estimated values for the parameters of each model obtained using dataset 3. Each parameter of the proposed model is represented through *a*ˆ = 194.7684, ˆ *b* = 0.3062, *c*ˆ = 307.0805, and ˆ *h* = 135.5641. Figure 4 shows the results of calculating the

estimated value of *m*(*t*) at each point in time based on the number of cumulative failures at each time point in dataset 3 and for each model equation. The black dotted line indicates the actual data, and the dark red solid line is the predicted failure value at each time point for the proposed model. Compared with other models, the proposed model shows the predicted value closest to the actual value.

Table 8 shows the results of calculating the criteria of each model using the parameters obtained through dataset 3. As a result, MSE and PC of the proposed model show the smallest values of 26.8047 and 24.5551, respectively, and Adj\_R<sup>2</sup> shows the closest value to 1 at 0.9765. In addition, MAE, PRR, PP, PRV, RMSPE, MEOP, and TS are 4.9209, 0.0096, 0.0092, 4.6668, 4.6668, 4.5694, and 2.5484, respectively, showing the second smallest values. Figure 4 shows the estimated failure values at each time point using the developed models. The Vtub model shows the most suitable criteria of 0.0094, 0.0090, 4.6356, 4.6357, and 2.5315 in PRR, PP, PRV, RMSPE, and TS, and DPF shows the most suitable criteria of 4.9195 and 4.5682 with MAE and MEOP. However, in calculating the AIC of the Vtub model, DPF, KSRGM, and the newly proposed model, a value indicating *t* = 14 is shown, indicating that the calculation is no longer being applied. In the process of calculating the AIC value, if there is no difference between the value at a specific point in time and the next point in time, the denominator is 0, so the AIC calculation can not be performed.

**Table 7.** Parameter estimation of model from dataset 3.


**Figure 4.** Prediction of all models for dataset 3.



#### **4. Optimal Release Time**

When releasing software, it is very important that find the optimal release time. In order to find that, we need to find a time that minimizes the cost. We apply *m*(*t*) proposed in Section 2 to the cost model to find the optimal time point between time to market and the minimum cost. The optimal time is suggested based on the cost model that reflects the software installation cost, software test cost, operation cost, software removal cost, and risk cost when the software failure occurs. Figure 5 describes the software field environment from the software installation of the software cost model. The expected software cost model follows Equation (22) [30,31].

$$\mathbb{C}(T) = \mathbb{C}\_0 + \mathbb{C}\_1 T + \mathbb{C}\_2 m(T) + \mathbb{C}\_3 (1 - R(\mathbf{x}|T)) \tag{22}$$

where *C*0 is the installation cost for system testing, *C*1 is the system test cost per unit time, *C*2 is the error removal cost per unit time during the test phase, and *C*3 is the penalty cost owing to a system failure. In addition, *x* represents the time the software was used. In addition, in the cost model equation, *<sup>R</sup>*(*x*|*T*) follows (23) [32,33].

$$R(\mathbf{x}|T) = \mathbf{e}^{-\left[m(t+\mathbf{x}) - m(t)\right]} \tag{23}$$

**Figure 5.** System cost model structure.

In this section, we propose a cost model using dataset 1 based on the proposed software reliability model and find the optimal time point between time to market and the minimum cost by changing the cost coefficients from *C*0 to *C*3.

#### *4.1. Results of the Optimal Release Time*

For the parameters of the cost model, *a*, *b*, *c*, and *h* calculated through numerical examples described in Section 3 were used. The cost coefficient of the cost model aims to find the optimal release time with the lowest cost by finding the optimal value through the changes in several values. The baseline value of the cost coefficient is as follows:

$$C\_0 = 500, \ C\_1 = 20, \ C\_2 = 50, \ C\_3 = 5000, \ x = 6$$

Here, baseline denotes to the reference value for confirming the change of the cost coefficient. The total cost value obtains as a reference value is 4888.856, and the optimal release time *T* at this time is 18.3. Table 9 changes the cost coefficient of each reference value, checks the minimum cost *C*(*T*) and optimal release time *T*<sup>∗</sup>, and then checks the changing trend to find the most optimal release time *T*<sup>∗</sup>. When *x* = 2, the smallest total cost value obtains 4886.985 at *T*∗ = 18.2. When *x* = 4, the smallest total cost value shows 4888.735 at *T*∗ = 18.3. When *x* = 6, the smallest total cost value shows 4888.856 at *T*∗ = 18.3. When *x* is 8 and 10, the smallest total cost value shows 4888.863 at *T*∗ = 18.3.


**Table 9.** Optimal release time of expected total cost according to baseline.

Here, *C*0 is the setup cost, and as the value increases, the cost, which is directly proportional, increases as well; thus, the lower the setup cost is, the lower the cost. Table 10 compares the changes when the coefficients of are 300, 500, and 700. It is found that the higher the value is, the higher the total cost value, whereas the optimal time does not change. Therefore, it appears that *C*0 does not help determine the optimal release point. However, because the setup cost for a system stabilization is required, the appropriate *C*0 cost coefficient is set to 500. Figure 6 shows a graph of the results according to the change in *C*0.

**Table 10.** Optimal release time of expected total cost according to *C*0.


**Figure 6.** Optimal release time of total cost according to *C*0.

Table 11 compares the changes when the coefficients of *C*1 are 10, 20, and 30. The results show that when *C*1 is 10, the total cost is the minimum at approximately 18.9 to 19.0, and when *C*1 is 20, the minimum value is at 18.2 to 18.3, and when it is 30, the total cost shows the minimum value at approximately 17.8 to 17.9. As the cost coefficient *C*1 increases, the optimal release time is gradually pushed back. Figure 7 shows a graph of the results according to the changes in *C*1.


**Table 11.** Optimal release time of expected total cost according to *C*1.

Table 12 compares the changes when the coefficients of *C*2 are 30, 40, 50, and 60. It can be seen that the cost coefficient *C*2 does not change from 18.2 to 18.3 at the optimal release time as the value changes. Figure 8 shows a graph of the results according to the change in *C*2.

Table 13 compares the changes when the coefficients of *C*3 are 5000, 7000, 10, 000, and 15, 000. The results show that when *C*3 is 5000, the total cost is the minimum at approximately 18.2 to 18.3; when it is 7000, it shows the minimum value at 18.5 to 18.6; when it is 10, 000, the total cost shows the minimum value at approximately 18.9 to 19.0; and when it is 15, 000, the total cost shows the minimum value at approximately 19.2 to 19.3. This indicates that the optimal release time gradually increases as the cost coefficient *C*3 increases. Figure 9 shows a graph of the results according to the changes in *C*3.

**Table 12.** Optimal release time of expected total cost according to *C*2.


**Figure 8.** Optimal release time of the total cost according to *C*2.



**Figure 9.** Optimal release time of the total cost according to *C*3.

#### *4.2. Results of Variation in Cost Model for Changes in Parameter*

In this section, we check whether the optimal release time is affected by the change in the cost model according to the change in the parameters of the proposed model. The parameters *a*, *b*, *c*, and *h* of the proposed model are set at −20%, −10%, 0%, 10%, and 20%, respectively, in 10% increments, and the coefficient of the cost model is fixed at the baseline value in Section 4.1. Thus, the minimum cost value is calculated depending on changes in the parameters, and it derives appropriate release time. In Table 14, 0% is the same as the value suggested in Table 9 by substituting the parameter estimates described in Section 3 and the coefficient values of the cost model proposed in Section 4.


**Table 14.** Optimal release time of cost according to parameter change.

From Table 14 and Figures 10–13, the value of the cost model *C*(*T*) increases as the change in parameter *a* increases, whereas the optimal release time *T*∗ decreases. As the values of parameters *b* and *h* increase, the cost model *C*(*T*) increases, and the release time *T*∗ is shown to decrease; in addition, it is found that the change in parameter *h* had a very slight effect on the optimal release time compared to parameter *b*. As the value of parameter *c* increases, the cost model *C*(*T*) and release time *T*\* increase together. Based on this, it is found that parameter *a* had a very large minimum width of the cost model compared with the changes of the other parameters, and parameter *b* had the greatest influence on determining the optimal release time.

**Figure 10.** Optimal release time of cost according to *a*.

**Figure 11.** Optimal release time of cost according to *b*.

**Figure 12.** Optimal release time of cost according to *c*.

**Figure 13.** Optimal release time of cost according to *h*.
