An Insight into the Data Structure of the Dynamic Batch Means Algorithm with Binary Tree Code

Chih, Mingchang

doi:10.3390/math7090791

Open AccessArticle

An Insight into the Data Structure of the Dynamic Batch Means Algorithm with Binary Tree Code

by

Mingchang Chih

Department of Business Administration, National Chung Hsing University, 145 Xingda Road, Taichung 40227, Taiwan

Mathematics 2019, 7(9), 791; https://doi.org/10.3390/math7090791

Submission received: 30 June 2019 / Revised: 7 August 2019 / Accepted: 22 August 2019 / Published: 30 August 2019

Download

Browse Figures

Versions Notes

Abstract

:

Batching is a well-known method used to estimate the variance of the sample mean in steady-state simulation. Dynamic batching is a novel technique employed to implement traditional batch means estimators without the knowledge of the simulation run length a priori. In this study, we reinvestigated the dynamic batch means (DBM) algorithm with binary tree hierarchy and further proposed a binary coding idea to construct the corresponding data structure. We also present a closed-form expression for the DBM estimator with binary tree coding idea. This closed-form expression implies a mathematical expression that clearly defines itself in an algebraic binary relation. Given that the sample size and storage space are known in advance, we can show that the computation complexity in the closed-form expression for obtaining the indexes

c_{j}^{(k)}

, i.e., the batch mean shifts

s

, is less than the effort in recursive expression.

Keywords:

estimation; simulation; variance of the sample mean; dynamic batch means; binary tree code

1. Introduction

Consider a sequence

{Y_{1}, Y_{2}, \dots, Y_{n}}

representing the

n

random observations of simulation output from a covariance stationary stochastic process with an unknown mean

μ = E (Y_{i})

and an unknown variance

σ^{2} = var (Y_{i})

. For instance,

Y_{i}

could represent the waiting time for the i-th customer in a certain queuing system or the transit time of the i-th part through a manufacturing line. We let

μ

be the performance measure,

{\overset{=}{Y}}_{n} = \sum_{i = 1}^{n} Y_{i} / n

be the point estimator of

μ

, and

var ({\overset{=}{Y}}_{n})

be the quality measure associated with using

{\overset{=}{Y}}_{n}

to estimate

μ

. Song and Schmeiser [1] showed the following:

n var ({\overset{=}{Y}}_{n}) = γ_{0} σ^{2} - \frac{γ_{1} σ^{2}}{n} + o (n^{- 1})

(1)

where

γ_{0} = 1 + 2 \sum_{h = 1}^{\infty} ρ_{h}

is the sum of all correlations,

γ_{1} = 2 \sum_{h = 1}^{\infty} h ρ_{h}

is the sum of all weighted correlations, and

ρ_{h} = corr (Y_{i}, Y_{i + h})

is the lag

h

correlation of

Y_{i}

and

Y_{i + h}

, which satisfies

ρ_{h} = O (δ^{h})

for

h = 1, 2, \dots,

and

δ \in (0, 1)

to reflect a general correlation structure appropriate for a wide range of stochastic processes, including waiting times in the steady-state M/M/1 queuing system [2].

Estimating the variance of the sample mean is a fundamental problem in simulation output analysis. It is crucial for the calculation of the confidence interval of the population mean [3] and the probability of selecting correctly from alternatives available to the decision-makers [4]. Currie and Chen [5] gave a tutorial that introduced some useful techniques for analyzing the output of stochastic simulation models, such as the methods for determining the optimal warm-up length and number of replications. Furthermore, the ways of using simulation to compare different systems were introduced as well. Chen [6] discussed how statistical techniques are applied in simulation output analysis, e.g., initialization bias reduction, tests of independence, and quantile estimation.

Batching [7] is a well-known technique used to estimate the variance of the sample mean in steady-state simulation. The batching idea is to divide

n

observations into

b

batches, each of which has the size

m

. Traditional batch means estimators [8,9,10] in simulation output analysis assume that the simulation run length or the sample size

n

is known in advance. The storage requirements for these estimators typically require

O (n)

space. Recently, some advanced methods of combining batch means estimators have been discussed. Song [11] proposed a rule that linearly combines two smallest batch size estimators and uses the optimal weight as the function of the sum of all correlations of the data process (

γ_{0}

). The simulation results concluded that the Song rule provides significant advancements in estimating the variance of the sample mean. Vats et al. [12] discussed the multivariate output analysis and proposed the multivariate batch means estimator for Markov chain Monte Carlo (MCMC) simulation.

Dynamic batch means (DBM) estimators use Fishman’s idea [13] of periodically doubling batch sizes as sampling procedures. DBM estimators [14,15,16] use only finite storage space (i.e., fixed memory), increase batch size dynamically as simulation run length increases, and compute the batch means estimates according to the value of the current batch size. DBM estimators require only

O (1)

storage space and no knowledge of the simulation run length a priori. Song [14] proposed the dynamic overlapping batch means estimator (DOBM), which allows users to implement the traditional overlapping batch means estimator (OBM) for

(100 f) %

overlapping cases without knowing the sample size in advance, where

f = 0, 1 / 2, 3 / 4, 7 / 8, \dots

. Song [14] also showed that DOBM and OBM are equivalent in terms of data structure properties using the recursive relationships.

In this study, we modified the DBM algorithm by incorporating a binary tree hierarchy and further proposed a binary coding idea to construct the corresponding data structure. Therefore, we present a closed-form expression for the DBM estimators with our binary tree coding idea. To the best of our knowledge, this closed-form expression is original and is proposed in this study to clearly define the mathematical expression for the estimator via a straightforward algebraic binary relation.

The remainder of this paper is organized as follows. Section 2 reviews the background of the traditional batch means and DBM estimators. Section 3 defines the binary tree hierarchy in developing the closed-form expression for the DBM estimators. Section 4 describes the computational effort analysis. Section 5 concludes the paper.

2. Background

2.1. Batch Means Estimators

Batching divides observations into batches (

b

batches, with each batch having the size

m

, i.e.,

n = b m

) and uses these batches as the basic data units for analysis. Song and Schmeiser [1] summarized and defined batch means estimators, including nonoverlapping batch means (NBM) [9], OBM [8], partial-overlapping batch means (PBM) [10], and spaced batch means (SBM) [17], as a special case of the following function of

m

and

s

:

{\hat{V}}_{BM} (m, s) = \frac{\sum_{i = 1}^{b} {({\bar{Y}}_{s (i - 1) + 1, m} - {\overset{=}{Y}}_{n})}^{2}}{d}

(2)

where

1 \leq s \leq n - m

is the distance (or shift) between the first observation of any two adjacent batches,

d = b (n m^{- 1} - 1)

,

b = (n - m + s) / s

is the number of batches (where

⌊ x ⌋

is the greatest integer smaller than or equal to

x

),

{\bar{Y}}_{s (i - 1) + 1, m} = \sum_{j = 1}^{m} \frac{Y_{s (i - 1) + j}}{m}

(3)

is the i-th batch mean, and

Y_{s (i - 1) + j}

is the j-th observation in the i-th batch.

The NBM estimator with batch size

m

is the special case obtained when

s = m

. The OBM estimator with batch size

m

is the special case obtained when

s = 1

. The PBM estimator with batch size

m

is the special case obtained when

1 < s < m

. The

(100 f) %

OBM,

0 < f < 1

, another form of PBM, indicates that

(100 f) %

overlap exists among all data between two adjacent batches, where

f = 1 - (s / m)

and

1 < s < m

(i.e.,

0 \leq f < 1

). For example, 75% OBM is the

{\hat{V}}_{BM} (m, s)

estimator for

s = m / 4

. The SBM estimator with batch size

m

is the special case obtained when

s > m

.

2.2. Dynamic Batch Means Estimators

Yeh and Schmeiser [18] were the first to introduce the dynamic batching idea into NBM. Song [14] further proposed a general form for the

(100 f) %

DOBM estimator using a recursive expression, where

f = 0, 1 / 2, 3 / 4, 7 / 8, \dots

. Several DBM variations [15,16,19] have been recently developed for steady-state simulation output analysis.

In DBM, all data (the data here may be the sum of several consecutive observations) are stored in a vector

\underline{L}

with size

2 w g

(i.e.,

\underline{L} = [{\underline{A}}_{1}, {\underline{A}}_{2}, \dots, {\underline{A}}_{w}]

,

{\underline{A}}_{j} = [A_{j} (1), A_{j} (2), \dots, A_{j} (2 g)]

,

j = 1, 2, \dots, w

), where

w

and

g

are two prespecified parameters:

w

is the number of subvectors in

\underline{L}

, and

2 g

is the number of cells (memory size) for each subvector

{\underline{A}}_{j}

,

j = 1, 2, \dots, w

. The key idea behind the DBM involves collapsing the w vectors. DBM combines two particular adjacent batches stored in

\underline{L}

when

\underline{L}

is full. This combining scheme is called “collapsing”. Instead of keeping each individual observation, DBM stores the sum of the observations for each batch. DBM adds new observations in the current cell(s) if the number of observations stored is less than the present batch size

m = 2^{k}

at stage

k

. Whenever the first subvector

{\underline{A}}_{1}

in

\underline{L}

is full, DBM starts to collapse data. Initially,

k

is set to 0 and is increased by 1 after collapsing. Collapsing starts with the subvector

{\underline{A}}_{J}

, where

j = \min {2^{k}, w}

and

J

could be regarded as the number of subvectors used at step

k

. Following collapsing in

{\underline{A}}_{J}

, the subsequent collapsing occurs in subvector

{\underline{A}}_{J - 1}

and other subvectors. The subvectors in

\underline{L}

are updated as follows:

A_{j}^{(k)} (i) = {\begin{matrix} A_{j / 2}^{(k - 1)} (2 i) + A_{j / 2}^{(k - 1)} (2 i + 1), i = 1, \dots, g - 1; & j is even; \\ A_{j / 2}^{(k - 1)} (2 g), i = g; & j is even; \\ A_{(j + 1) / 2}^{(k - 1)} (2 i - 1) + A_{(j + 1) / 2}^{(k - 1)} (2 i), i = 1, \dots, g - 1; & j is odd \end{matrix}

(4)

where

A_{j}^{(k)} (i)

,

i = 1, 2, \dots, 2 g

,

j = 1, 2, \dots, w

, is the i-th cell of subvector

{\underline{A}}_{j}

at step

k

and where the numerical values (batch sums) are stored, that is, the observations in certain pairs of adjacent batches after collapsing are combined into one batch sum. As a result, half of the vector becomes available to contain new observations.

Song [14] summarized and defined DBM estimators as a function of

m

and

s

at step

k

, as follows:

{\hat{V}}_{DBM} (m, s) = d^{- 1} \sum_{j = 1}^{2^{z}} \sum_{i = 1}^{b_{j}} {(\frac{A_{j}^{(k)} (i)}{m} - {\bar{\bar{Y}}}_{n})}^{2}

(5)

where

A_{j}^{(k)} (i)

is defined in Equation (4). Song [14] derived a proposition to show that DBM is equivalent to the traditional BM for some cases via a recursive expression:

A_{j}^{(k)} (i) = \sum_{t = 1}^{m} Y_{(i - 1) m + c_{j}^{(k)} + t}

(6)

where

i = 1, 2, \dots, {\dot{b}}_{j}

,

j = 1, 2, \dots, 2^{z}

,

{\dot{b}}_{j} = γ_{l} - 1 + m_{l} / 2^{k}

,

l = 1, 2, \dots, w

,

γ_{l}

is the cells used to store the latest observations,

m_{l}

is the number of observations stored in

A_{j}^{(k)} (i)

,

z = \min {k, \log_{2} w}

, and

c_{j}^{(k)} = {\begin{matrix} c_{1 + (j - 1) / 2}^{(k - 1)}, & j = 1, 3, \dots, 2^{z} - 1, \\ c_{j - 1}^{(k)} + 2^{k - 1}, & j = 2, 4, \dots, 2^{z} . \end{matrix}

(7)

The values of

c_{j}^{(k)}

are illustrated in Figure 1. The column corresponding to each

k

contains

c_{j}^{(k)}

,

j = 1, 2, \dots, 2^{z}

. For example, if

w = 8 = 2^{3}

and

k = 3

, then

z = 3

and the corresponding values of

c_{j}^{(3)}

,

j = 1, 2, \dots, 8

, are 0, 4, 2, 6, 1, 5, 3, 7, which are listed in the fourth column from the left-hand side (

k = 3

). Therefore, the corresponding first observations stored in cells

A_{j}^{(3)} (1)

,

j = 1, 2, \dots, 8

(shown in the second column from the right-hand side in Figure 1 in Song [13]) are

y_{1}, y_{5}, y_{3}, y_{7}, y_{2}, y_{6}, y_{4}

, and

y_{8}

.

The main drawback of the recursive expression for DBM is the difficulty in calculating the values of

c_{j}^{(k)}

. Considering that Equation (7) is a recursive equation, we have to go through all the values of

k

(i.e.,

k = 0, 1, 2, \dots

) to eventually obtain the values of

c_{j}^{(k)}

. In the subsequent section, we propose a closed-form expression for the DBM estimator using a binary tree coding idea.

3. Methods

Trees consisting of nodes and leaves are widely used structures for maintaining or analyzing ordered data [20,21]. Binary trees [22] have a characteristic feature in which each node has exactly two children, namely, the left child and the right child. On the basis of the following proper definitions, we obtained an insight into the data structure of the DBM algorithm:

Definition 1.

Let

T

be a binary tree. A leaf is a left (lower) leaf if it is a left (lower) child of its parent. A leaf is a right (upper) leaf if it is a right (upper) child of its parent.

We built the correspondence between trees and codes in representing

c_{j}^{(k)}

as follows: Label every left (lower) edge in a tree

T

with a value of 1 and every right (upper) edge with a value of 0.

Definition 2.

Let

x_{j}^{k} = (x_{j}^{k} (1), x_{j}^{k} (2), \dots, x_{j}^{k} (k))

be the binary codeword. The digit

x_{j}^{k} (i)

is assigned the value of 1 if it is a left (lower) leaf. The digit

x_{j}^{k} (i)

is assigned the value of 0 if it is a right (upper) leaf.

In this study,

x_{j}^{k} (i)

is a binary digit,

1 \leq i \leq k

, and

k

is called the length of the codeword. The binary codeword was introduced to keep the record/information of whether a term

2^{k - 1}

is added to

c_{j}^{(k)}

at step

k

or not in Figure 1. A rooted binary code tree corresponds to a binary codeword, as shown in Figure 2. Each pair of braces in the figure corresponds to a leaf on the tree, and the associated codeword is the binary digits within the braces.

Property 1.

Convert each of the binary codewords of

c_{j}^{(k)}

to its equivalent base-10 form. The resulting decimal value parallels the value

(j - 1)

for the suffix of

c_{j}^{(k)}

in each step

k

. In practice, we can obtain the binary codeword of

c_{j}^{(k)}

directly by converting its corresponding decimal value

(j - 1)

into base-2 form. This relationship can be expressed as follows:

[x_{j}^{k} (1), x_{j}^{k} (2), \dots, x_{j}^{k} (k - 1), x_{j}^{k} (k)] \cdot {[2^{k - 1}, 2^{k - 2}, \dots, 2^{1}, 2^{0}]}^{t} = j - 1

(8)

This property can be easily proven by substituting the binary codeword into Equation (8). This process is a simple base-2 to base-10 conversion. We take the binary codewords at step

k = 3

in Figure 2 as an example. The eight codewords are

{000}

,

{001}

,

{010}

,

{011}

,

{100}

,

{101}

,

{110}

, and

{111}

. The resulting decimal values are 0, 1, 2, 3, 4, 5, 6, and 7, respectively.

Property 2.

Let

\underline{B}

be the inverse binary-coded matrix defined as

\underline{B} \equiv {[2^{0}, 2^{1}, \dots, 2^{k - 1}]}^{t}

. Convert each of the binary codewords of

c_{j}^{(k)}

to its equivalent base-10 form via the inverse binary-coded matrix

\underline{B}

. The resulting decimal value is exactly the value of

c_{j}^{(k)}

in each step

k

. This relationship can be modeled as follows:

c_{j}^{(k)} = x_{j}^{k} \cdot \underline{B} = [x_{j}^{k} (1), x_{j}^{k} (2), \dots, x_{j}^{k} (k)] \cdot {[2^{0}, 2^{1}, \dots, 2^{k - 1}]}^{t}

(9)

Proof.

The proof of this property can be carried out using the method of mathematical induction.

Initially, we check the first base case for

k = 1

and obtain the result that

c_{j}^{(k = 1)} = x_{j}^{k = 1} \cdot \underline{B} = [x_{j}^{k = 1} (1)] \cdot [2^{0}] = 2^{0} \cdot x_{j}^{k = 1} (1), j = 0, 1 .

If

j = 0

,

c_{j = 0}^{(k = 1)} = 2^{0} \cdot 0 = 0

. If

j = 1

,

c_{j = 1}^{(k = 1)} = 2^{0} \cdot 1 = 1

. We can show that the result is true for the first base case, i.e.,

k = 1

.

Further, we discuss the second base case for

k = 2

and have

c_{j}^{(k = 2)} = x_{j}^{k = 2} \cdot \underline{B} = [x_{j}^{k = 2} (1), x_{j}^{k = 2} (2)] \cdot {[2^{0}, 2^{1}]}^{t} = 2^{0} \cdot x_{j}^{k = 2} (1) + 2^{1} \cdot x_{j}^{k = 2} (2), j = 0, 1, 2, 3 .

If

j = 0

,

c_{j = 0}^{(k = 2)} = 2^{0} \cdot 0 + 2^{1} \cdot 0 = 0

. If

j = 1

,

c_{j = 1}^{(k = 2)} = 2^{0} \cdot 0 + 2^{1} \cdot 1 = 2

. If

j = 2

,

c_{j = 2}^{(k = 2)} = 2^{0} \cdot 1 + 2^{1} \cdot 0 = 1

. If

j = 3

,

c_{j = 3}^{(k = 2)} = 2^{0} \cdot 1 + 2^{1} \cdot 1 = 3

. We can also show that the property is true for

k = 2

.

In the induction step, we assume that the result is true for the case

k = n

as well. The inductive hypothesis supposes that

c_{j}^{(k = n)} = x_{j}^{k = n} \cdot \underline{B} = [x_{j}^{k = n} (1), x_{j}^{k = n} (2), \dots, x_{j}^{k = n} (n)] \cdot {[2^{0}, 2^{1}, \dots, 2^{n - 1}]}^{t} = 2^{0} \cdot x_{j}^{k = n} (1) + 2^{1} \cdot x_{j}^{k = n} (2) + \dots + 2^{n - 1} \cdot x_{j}^{k = n} (n), j = 0, 1, \dots, 2^{k} - 1 .

That is,

c_{j}^{(k = n)} = \sum_{i = 1}^{n} 2^{i - 1} \cdot x_{j}^{k = n} (i), j = 0, 1, \dots, 2^{k} - 1

.

Moreover, we have to show that it is also true for

k = n + 1

. When

k = n + 1

,

c_{j}^{(k = n + 1)} = x_{j}^{k = n + 1} \cdot \underline{B} = [x_{j}^{k = n + 1} (1), x_{j}^{k = n + 1} (2), \dots, x_{j}^{k = n + 1} (n), x_{j}^{k = n + 1} (n + 1)] \cdot {[2^{0}, 2^{1}, \dots, 2^{n - 1}, 2^{n}]}^{t} = 2^{0} \cdot x_{j}^{k = n + 1} (1) + 2^{1} \cdot x_{j}^{k = n + 1} (2) + \dots + 2^{n - 1} \cdot x_{j}^{k = n + 1} (n) + 2^{n} \cdot x_{j}^{k = n + 1} (n + 1), j = 0, 1, \dots, 2^{k} - 1 .

We then have the result that

c_{j}^{(k = n + 1)} = \sum_{i = 1}^{n} 2^{i - 1} \cdot x_{j}^{k = n + 1} (i) + 2^{n} \cdot x_{j}^{k = n + 1} (n + 1) = c_{j}^{(k = n)} + 2^{n} \cdot x_{j}^{k = n + 1} (n + 1)

, where

j = 0, 1, \dots, 2^{k} - 1

, that is, the value of

c_{j}^{(k = n + 1)}

for a given node is equal to the summation of the value of its parent’s

c_{j}^{(k = n)}

and

2^{n} \cdot x_{j}^{k = n + 1} (n + 1)

. If the given node is a right (upper) child of its parent, then

x_{j}^{k = n + 1} (n + 1) = 0

; otherwise,

x_{j}^{k = n + 1} (n + 1) = 1

. The formula is therefore true for

k = n + 1

. □

As both the base cases and the inductive step have been performed, by mathematical induction, the property holds for all the natural numbers

n

.

The explanation and demonstration of this property are further given below. The binary codeword

x_{j}^{k}

is defined as a tracking code and keeps the record of whether a term

2^{k - 1}

is added to

c_{j}^{(k)}

at step

k

or not in the original recursive relationship. The inverse binary-coded matrix

\underline{B}

corresponds to the value added to

c_{j}^{(k)}

at step

k

if it is necessary (i.e.,

x_{j}^{k} (i) = 1

). Consequently, the product of

x_{j}^{k} \cdot \underline{B}

is the value of

c_{j}^{(k)}

. The illustration of this property can be derived with the use of an example with

k = 3

in Figure 2. The corresponding eight codewords are

{000}

,

{001}

,

{010}

,

{011}

,

{100}

,

{101}

,

{110}

, and

{111}

. The relationship can be proven by substituting the binary codeword into Equation (9). The resulting decimal values with inverse binary-coded matrix

\underline{B}

are 0, 4, 2, 6, 1, 5, 3, and 7. We take the last codeword

{111}

as an example:

c_{7}^{(3)} = x_{7}^{3} \cdot \underline{B} = [x_{7}^{3} (1), x_{7}^{3} (2), x_{7}^{3} (3)] \cdot {[2^{0}, 2^{1}, 2^{2}]}^{t} = [1, 1, 1] \cdot {[1, 2, 4]}^{t} = 7

(10)

The relationship between the binary codeword and the value of

c_{j}^{(k)}

is clearly demonstrated. The verification of this property could also be checked and confirmed via the recursive equation of

c_{j}^{(k)}

in Figure 1.

Theorem 1.

The closed-form expression for the DBM estimator with the use of the binary tree code at step

k

can be represented as follows:

{\hat{V}}_{DBM} (m, s) = d^{- 1} \sum_{j = 1}^{2^{z}} \sum_{i = 1}^{b_{j}} {(\frac{A_{j}^{(k)} (i)}{m} - {\overset{=}{Y}}_{n})}^{2}

(11)

where

A_{j}^{(k)} (i) = \sum_{t = 1}^{m} Y_{(i - 1) m + c_{j}^{(k)} + t}

,

c_{j}^{(k)} = x_{j}^{k} \cdot \underline{B}

,

i = 1, 2, \dots, b_{j}

,

j = 1, 2, \dots, 2^{z}

, and

z = m i n {k, \log_{2} w}

.

Proof.

The proof of this theorem is straightforward. On the basis of Property 2, we can reformulate the mathematical form of the DBM estimator using the proposed expression of

c_{j}^{(k)}

instead of the original recursive relationships. The closed-form expression for the DBM estimator is then obtained accordingly. □

4. Results

In this section, we describe the computation effort analysis for the recursive expression of Song [14] and the closed-form expression proposed in this study. Because the data structure stored in the DBM depends on the total number of times that collapsing has occurred, we performed the computation effort analysis under the condition that the sample size

n

and the prespecified storage space

g

are given.

Proposition 1.

Given that the sample size and the storage space are

n

and

g

, respectively, the computational complexity in obtaining

c_{j}^{(k)}

for the recursive expression of the DBM estimator is

O ({l o g}_{2} n / (g - 1))

.

The computational effort in obtaining

c_{j}^{(k)}

for the recursive expression is proportional to the total number of times that collapsing has occurred because the DBM algorithm has to update the values of

c_{j}^{(k)}

accordingly when collapsing occurs in the procedure. In other words, the final values of

c_{j}^{(k)}

must be calculated based on the recursive equation from first step

k = 0

to last step

k

. If the simulation run length is n and the memory parameter is g, then the total number of times that collapsing has occurred is

k = \log_{2} n / (g - 1) .

Therefore, the computational effort in the recursive expression is

O (\log_{2} n / (g - 1))

.

Proposition 2.

Given that the sample size and the storage space are

n

and

g

, respectively, the computational complexity in obtaining

c_{j}^{(k)}

for the closed-form expression of the DBM estimator is O(1).

We have to examine the computational effort needed in Equation (9) to determine the complexity in computing

c_{j}^{(k)}

for the closed-form expression. If the simulation run length is

n

and the memory parameter is

g

, then we can determine the binary codewords based on the final value of the parameter

k

, where

k = \log_{2} n / (g - 1)

. The values of

c_{j}^{(k)}

are consequently obtained by substituting the binary codewords into Equation (9) directly, whereas there is only a single matrix multiplication operation with

k

elements for each matrix. More precisely, there are

k

multiplication operations and

k - 1

addition operations in the matrix multiplication, i.e., in total,

2 k - 1

operations are needed for Equation (9). However, once the simulation run length

n

and the memory parameter

g

are given and known, the parameter

k

is then determined and it is a fixed constant. Therefore, this result leads to the conclusion that the computational complexity is O(1).

On the basis of the results in Propositions 1 and 2, we can conclude that the computation effort in the closed-form expression for obtaining the indexes

c_{j}^{(k)}

, i.e., the batch mean shifts

s

, is less than the effort needed in the recursive expression.

5. Conclusions

In this study, we reinvestigated the DBM algorithm and modified it with a binary tree hierarchy. With a well-defined binary codeword, this study proposes a closed-form expression for the DBM estimator instead of using crude recursive relationships. This closed-form expression yields a mathematical expression that is clearly defined by our algebraic binary relation. Based on the computation effort analysis, we can conclude that the computation effort in the closed-form expression for obtaining the indexes

c_{j}^{(k)}

is less than the effort needed in the recursive expression.

Funding

This research was supported by the Ministry of Science and Technology of Taiwan under Grant No. MOST 104-2218-E-005-009.

Acknowledgments

The author sincerely thanks the editor and two anonymous reviewers for their perceptive comments and valuable suggestions to improve this paper.

Conflicts of Interest

The author declares no conflict of interest.

References

Song, W.-M.T.; Schmeiser, B.W. Optimal mean-squared-error batch size. Manag. Sci. 1995, 41, 110–123. [Google Scholar] [CrossRef]
Aktaran-Kalaycı, T.; Alexopoulos, C.; Argon, N.T.; Goldsman, D.; Wilson, J.R. Exact expected values of variance estimators for simulation. Nav. Res. Logist. 2007, 54, 397–410. [Google Scholar] [CrossRef]
Chen, E.J. A Stopping Rule Using the Quasi-Independent Stopping Sequence. J. Simul. 2012, 6, 71–80. [Google Scholar] [CrossRef]
Pedrielli, G.; Zhu, Y.; Lee, L.H.; Li, H. Empirical analysis of the performance of variance estimators in sequential single-run Ranking & Selection: The case of Time Dilation algorithm. In Proceedings of the Winter Simulation Conference, Washington, DC, USA, 11–14 December 2016; pp. 738–748. [Google Scholar]
Currie, C.S.M.; Cheng, R.C.H. A practical introduction to analysis of simulation output data. In Proceedings of the Winter Simulation Conference, Washington, DC, USA, 11–14 December 2016; pp. 118–132. [Google Scholar]
Chen, E.J. Simulation Output Analysis and Risk Management; IGI Global: Hershey, PA, USA, 2016; pp. 200–220. [Google Scholar]
Conway, R.W. Some tactical problems in digital simulation. Manag. Sci. 1963, 6, 92–110. [Google Scholar] [CrossRef]
Meketon, M.S.; Schmeiser, B.W. Overlapping batch means: Something for nothing? In Proceedings of the Winter Simulation Conference, Dallas, TX, USA, 28–30 November 1984; pp. 227–230. [Google Scholar]
Schmeiser, B.W. Batch size effects in the analysis of simulation output. Oper. Res. 1982, 30, 556–568. [Google Scholar] [CrossRef]
Welch, P.D. On the relationship between batch means, overlapping batch means and spectral estimation. In Proceedings of the Winter Simulation Conference, Atlanta, GA, USA, 14–16 December 1987; pp. 320–323. [Google Scholar]
Song, W.-M.T. The Song rule outperforms optimal-batch-size variance estimators in simulation output analysis. Eur. J. Oper. Res. 2019, 275, 1072–1082. [Google Scholar] [CrossRef]
Vats, D.; Flegal, J.M.; Jones, G.L. Multivariate output analysis for Markov chain Monte Carlo. Biometrika 2019, 106, 321–337. [Google Scholar] [CrossRef] [Green Version]
Fishman, G.S. Grouping observations in digital simulation. Manag. Sci. 1978, 24, 510–521. [Google Scholar] [CrossRef]
Song, W.-M.T. A finite-memory algorithm for batch means estimators in simulation output analysis. IEEE Trans. Autom. Control 2011, 56, 1157–1162. [Google Scholar] [CrossRef]
Song, W.-M.T.; Chih, M. Extended dynamic partial-overlapping batch means estimators for steady-state simulations. Eur. J. Oper. Res. 2010, 203, 640–651. [Google Scholar] [CrossRef]
Song, W.-M.T.; Chih, M. Run Length Not Required: Optimal-MSE Dynamic Batch Means Estimators for Steady-State Simulation. Eur. J. Oper. Res. 2013, 229, 114–123. [Google Scholar] [CrossRef]
Fox, B.L.; Goldsman, D.; Swain, J.J. Spaced batch means. Oper. Res. Lett. 1991, 10, 255–263. [Google Scholar] [CrossRef]
Yeh, Y.; Schmeiser, B.W. Simulation output analysis via dynamic batch means. In Proceedings of the Winter Simulation Conference, Orlando, FL, USA, 10–13 December 2000; pp. 637–645. [Google Scholar]
Chih, M.; Song, W.-M.T. An efficient approach to implement dynamic batch means estimators in simulation output analysis. J. Chin. Inst. Ind. Eng. 2012, 29, 163–180. [Google Scholar] [CrossRef]
Chan, S.-L.; Golin, M.J. A dynamic programming algorithm for constructing optimal “1”-ended binary prefix-free codes. IEEE Trans. Inf. Theory 2000, 46, 1637–1644. [Google Scholar] [CrossRef]
Tseng, Y.-C.; Chao, C.-M. Code placement and replacement strategies for wideband CDMA OVSF code tree management. IEEE Trans. Mob. Comput. 2002, 1, 293–302. [Google Scholar] [CrossRef]
Mäkinen, E. A survey on binary tree codings. Comput. J. 1911, 34, 438–443. [Google Scholar] [CrossRef]

Figure 1. Values of

c_{j}^{(k)}

in dynamic overlapping batch means estimator (DOBM) with the recursive relation.

Figure 1. Values of

c_{j}^{(k)}

in dynamic overlapping batch means estimator (DOBM) with the recursive relation.

Figure 2. Utilizing rooted binary code tree to represent

c_{j}^{(k)}

.

Figure 2. Utilizing rooted binary code tree to represent

c_{j}^{(k)}

.

© 2019 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Chih, M. An Insight into the Data Structure of the Dynamic Batch Means Algorithm with Binary Tree Code. Mathematics 2019, 7, 791. https://doi.org/10.3390/math7090791

AMA Style

Chih M. An Insight into the Data Structure of the Dynamic Batch Means Algorithm with Binary Tree Code. Mathematics. 2019; 7(9):791. https://doi.org/10.3390/math7090791

Chicago/Turabian Style

Chih, Mingchang. 2019. "An Insight into the Data Structure of the Dynamic Batch Means Algorithm with Binary Tree Code" Mathematics 7, no. 9: 791. https://doi.org/10.3390/math7090791

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

An Insight into the Data Structure of the Dynamic Batch Means Algorithm with Binary Tree Code

Abstract

1. Introduction

2. Background

2.1. Batch Means Estimators

2.2. Dynamic Batch Means Estimators

3. Methods

4. Results

5. Conclusions

Funding

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI