1. Introduction
Consider a sequence
representing the
random observations of simulation output from a covariance stationary stochastic process with an unknown mean
and an unknown variance
. For instance,
could represent the waiting time for the
i-th customer in a certain queuing system or the transit time of the
i-th part through a manufacturing line. We let
be the performance measure,
be the point estimator of
, and
be the quality measure associated with using
to estimate
. Song and Schmeiser [
1] showed the following:
where
is the sum of all correlations,
is the sum of all weighted correlations, and
is the lag
correlation of
and
, which satisfies
for
and
to reflect a general correlation structure appropriate for a wide range of stochastic processes, including waiting times in the steady-state M/M/1 queuing system [
2].
Estimating the variance of the sample mean is a fundamental problem in simulation output analysis. It is crucial for the calculation of the confidence interval of the population mean [
3] and the probability of selecting correctly from alternatives available to the decision-makers [
4]. Currie and Chen [
5] gave a tutorial that introduced some useful techniques for analyzing the output of stochastic simulation models, such as the methods for determining the optimal warm-up length and number of replications. Furthermore, the ways of using simulation to compare different systems were introduced as well. Chen [
6] discussed how statistical techniques are applied in simulation output analysis, e.g., initialization bias reduction, tests of independence, and quantile estimation.
Batching [
7] is a well-known technique used to estimate the variance of the sample mean in steady-state simulation. The batching idea is to divide
observations into
batches, each of which has the size
. Traditional batch means estimators [
8,
9,
10] in simulation output analysis assume that the simulation run length or the sample size
is known in advance. The storage requirements for these estimators typically require
space. Recently, some advanced methods of combining batch means estimators have been discussed. Song [
11] proposed a rule that linearly combines two smallest batch size estimators and uses the optimal weight as the function of the sum of all correlations of the data process (
). The simulation results concluded that the Song rule provides significant advancements in estimating the variance of the sample mean. Vats et al. [
12] discussed the multivariate output analysis and proposed the multivariate batch means estimator for Markov chain Monte Carlo (MCMC) simulation.
Dynamic batch means (DBM) estimators use Fishman’s idea [
13] of periodically doubling batch sizes as sampling procedures. DBM estimators [
14,
15,
16] use only finite storage space (i.e., fixed memory), increase batch size dynamically as simulation run length increases, and compute the batch means estimates according to the value of the current batch size. DBM estimators require only
storage space and no knowledge of the simulation run length a priori. Song [
14] proposed the dynamic overlapping batch means estimator (DOBM), which allows users to implement the traditional overlapping batch means estimator (OBM) for
overlapping cases without knowing the sample size in advance, where
. Song [
14] also showed that DOBM and OBM are equivalent in terms of data structure properties using the recursive relationships.
In this study, we modified the DBM algorithm by incorporating a binary tree hierarchy and further proposed a binary coding idea to construct the corresponding data structure. Therefore, we present a closed-form expression for the DBM estimators with our binary tree coding idea. To the best of our knowledge, this closed-form expression is original and is proposed in this study to clearly define the mathematical expression for the estimator via a straightforward algebraic binary relation.
The remainder of this paper is organized as follows.
Section 2 reviews the background of the traditional batch means and DBM estimators.
Section 3 defines the binary tree hierarchy in developing the closed-form expression for the DBM estimators.
Section 4 describes the computational effort analysis.
Section 5 concludes the paper.
3. Methods
Trees consisting of nodes and leaves are widely used structures for maintaining or analyzing ordered data [
20,
21]. Binary trees [
22] have a characteristic feature in which each node has exactly two children, namely, the left child and the right child. On the basis of the following proper definitions, we obtained an insight into the data structure of the DBM algorithm:
Definition 1. Let be a binary tree. A leaf is a left (lower) leaf if it is a left (lower) child of its parent. A leaf is a right (upper) leaf if it is a right (upper) child of its parent.
We built the correspondence between trees and codes in representing as follows: Label every left (lower) edge in a tree with a value of 1 and every right (upper) edge with a value of 0.
Definition 2. Let be the binary codeword. The digit is assigned the value of 1 if it is a left (lower) leaf. The digit is assigned the value of 0 if it is a right (upper) leaf.
In this study,
is a binary digit,
, and
is called the length of the codeword. The binary codeword was introduced to keep the record/information of whether a term
is added to
at step
or not in
Figure 1. A rooted binary code tree corresponds to a binary codeword, as shown in
Figure 2. Each pair of braces in the figure corresponds to a leaf on the tree, and the associated codeword is the binary digits within the braces.
Property 1. Convert each of the binary codewords of to its equivalent base-10 form. The resulting decimal value parallels the value for the suffix of in each step . In practice, we can obtain the binary codeword of directly by converting its corresponding decimal value into base-2 form. This relationship can be expressed as follows: This property can be easily proven by substituting the binary codeword into Equation (8). This process is a simple base-2 to base-10 conversion. We take the binary codewords at step
in
Figure 2 as an example. The eight codewords are
,
,
,
,
,
,
, and
. The resulting decimal values are 0, 1, 2, 3, 4, 5, 6, and 7, respectively.
Property 2. Let be the inverse binary-coded matrix defined as . Convert each of the binary codewords of to its equivalent base-10 form via the inverse binary-coded matrix . The resulting decimal value is exactly the value of in each step . This relationship can be modeled as follows: Proof. The proof of this property can be carried out using the method of mathematical induction.
Initially, we check the first base case for and obtain the result that If , . If , . We can show that the result is true for the first base case, i.e., .
Further, we discuss the second base case for and have If , . If , . If , . If , . We can also show that the property is true for .
In the induction step, we assume that the result is true for the case as well. The inductive hypothesis supposes that That is, .
Moreover, we have to show that it is also true for . When , We then have the result that , where , that is, the value of for a given node is equal to the summation of the value of its parent’s and . If the given node is a right (upper) child of its parent, then ; otherwise, . The formula is therefore true for . □
As both the base cases and the inductive step have been performed, by mathematical induction, the property holds for all the natural numbers .
The explanation and demonstration of this property are further given below. The binary codeword
is defined as a tracking code and keeps the record of whether a term
is added to
at step
or not in the original recursive relationship. The inverse binary-coded matrix
corresponds to the value added to
at step
if it is necessary (i.e.,
). Consequently, the product of
is the value of
. The illustration of this property can be derived with the use of an example with
in
Figure 2. The corresponding eight codewords are
,
,
,
,
,
,
, and
. The relationship can be proven by substituting the binary codeword into Equation (9). The resulting decimal values with inverse binary-coded matrix
are 0, 4, 2, 6, 1, 5, 3, and 7. We take the last codeword
as an example:
The relationship between the binary codeword and the value of
is clearly demonstrated. The verification of this property could also be checked and confirmed via the recursive equation of
in
Figure 1.
Theorem 1. The closed-form expression for the DBM estimator with the use of the binary tree code at step can be represented as follows:where , , , , and . Proof. The proof of this theorem is straightforward. On the basis of Property 2, we can reformulate the mathematical form of the DBM estimator using the proposed expression of instead of the original recursive relationships. The closed-form expression for the DBM estimator is then obtained accordingly. □
4. Results
In this section, we describe the computation effort analysis for the recursive expression of Song [
14] and the closed-form expression proposed in this study. Because the data structure stored in the DBM depends on the total number of times that collapsing has occurred, we performed the computation effort analysis under the condition that the sample size
and the prespecified storage space
are given.
Proposition 1. Given that the sample size and the storage space are and , respectively, the computational complexity in obtaining for the recursive expression of the DBM estimator is .
The computational effort in obtaining for the recursive expression is proportional to the total number of times that collapsing has occurred because the DBM algorithm has to update the values of accordingly when collapsing occurs in the procedure. In other words, the final values of must be calculated based on the recursive equation from first step to last step . If the simulation run length is n and the memory parameter is g, then the total number of times that collapsing has occurred is Therefore, the computational effort in the recursive expression is .
Proposition 2. Given that the sample size and the storage space are and , respectively, the computational complexity in obtaining for the closed-form expression of the DBM estimator is O(1).
We have to examine the computational effort needed in Equation (9) to determine the complexity in computing for the closed-form expression. If the simulation run length is and the memory parameter is , then we can determine the binary codewords based on the final value of the parameter , where . The values of are consequently obtained by substituting the binary codewords into Equation (9) directly, whereas there is only a single matrix multiplication operation with elements for each matrix. More precisely, there are multiplication operations and addition operations in the matrix multiplication, i.e., in total, operations are needed for Equation (9). However, once the simulation run length and the memory parameter are given and known, the parameter is then determined and it is a fixed constant. Therefore, this result leads to the conclusion that the computational complexity is O(1).
On the basis of the results in Propositions 1 and 2, we can conclude that the computation effort in the closed-form expression for obtaining the indexes , i.e., the batch mean shifts , is less than the effort needed in the recursive expression.