Appendix A
This part, we illustrate how to use Path Segmentation method to determine the average thermal optimal path when related g(i, t − i) are unacceptably large. Let us define some notations firstly.
Let n* be a threshold. Suppose for both i and j not greater than n*, the cumulative Boltzman factor g(i,j) can be obtained; otherwise the value of g(i,j) becomes unacceptable large. It is not difficult to determine the suitable value of n* with the related empirical results for any specific cases.
Let n be the maximum subscription of data. Since we denote the first data pair as (x0, y0), the sample size is n + 1. We divide all samples into H groups. The number of the groups and the number of sample of each group would not unique. In our case, we divide all samples into almost equally 3 groups with consideration of convenience of calculation i.e., H = 3. The first group consists of n1 + 1 data. The second group consists of n1 data; and the third one consists of n − 2n1. We define n1 as partition coefficient in this paper. Once H is set to 3, then we request that 2n1 < n ≤ 3n1; n* ≥ n1 + 3β; and n1 > 2β, where β, a positive integer, representing the possible maximum lag. For the general case, the request becomes: (H − 1)n1 < n ≤ Hn1, n* ≥ n1 + Hβ, and n1 > (H − 1)β. These requirements about n1, n* and β can be met easily, because the way to divide all samples is not unique. Furthermore, in order to use as much data as possible and therefore use more information from data, we can set n1 to be minimum of integer value of n/2 and n* − 3β. In this study, n = 1249 and we set n1 = 420, n* = 510, and β = 30.
Now, we present the process of determination of that involves the computation of g(i,j) for i, j subject to n* < i, j < n* + n1 − β. It is obviously infeasible to directly compute the value of g(i,j) when both i,j > n*. In our case, we find that g(i,j) is still not computable when I > n* and 0 < j ≤ n* or j > n* and 0 < i ≤ n*. We will talk about this later. Let us go back to the case when n* < i,j < n* + n1 −β. Furthermore, we have that i,j > n1 + β is satisfied, because of the predetermined assumption of n* ≥ n1 + 3β.
It is easily to be known that any path from origin point (0,0) to ending point (i,j) must pass through a point among 2β + 1 points satisfying {(t1,t2): t1 = n1, n1 − β≤ t2 ≤ n1 + β, t1 and t2 are integers}, because all paths must lie between −β ≤ t2 − t1 ≤ β under the assumption of (14) and i, j > n* ≥ n1 + 3β. Thus, all paths from origin to (i,j) can be described by a union of 2β + 1 disjoint path sets passing through the different 2β + 1 points, i.e., (n1, n1 − β), (n1,n1 – β + 1),…,(n1, n1 + β).
Let PK+1 represent a path set defined as follows:
PK+1 = + = {C(1): C(1) = C(0,0)→(n1, n1-β+K)} +{C(2): C(2) = C(n1, n1−β+K)→(i,j)}
= {C: C = C(1) + C(2)}, K = 0,1,…,2β.
Here C(0,0)→(n1, n1−β+K) represents a path starting from (0,0) and ending at (n1,n1 − β + K). Moreover, there is only point of intersection of the path denoted by C(0,0)→(n1, n1 − β + K) and vertical line t1 = n1 in coordinates (t1, t2). Here, C(n1, n1 − β + K)→(i,j) represents an arbitrary path starting from (n1,n1 − β + K) ending at (i,j). C is a path from origin (0,0) to (i,j). Thus, the union of PK+1, K = 0,1,…,2β, is the set of all path starting from (0,0) and ending at (i,j) and these path sets denoted by PK+1 are disjoint. “Disjoint path set” here means that there is no any completely same path in different path sets PK+1 with different value of K. We discuss how to obtain the cumulative Boltzman factors of different path sets PK+1 below.
① For
C∈P
1(
K = 0),
C can be divided into two segments denoted by
C(1) and
C(2), i.e.,
C =
C(1) +
C(2) =
C(0,0)→(n1, n1 − β) +
C(n1, n1 − β)→(i,j). The energy corresponding
C can be described by:
So, the Boltzmann factor corresponding to path C is:
Furthermore, we have:
where
.
g1 represents the sum of Boltzmann factor over all path starting from (
n1,
n1 −
β) and ending at (
i,j). According to previous statement that
n* <
i,
j <
n* +
n1−
β, we have
I −
n1 <
n* −
β <
n* and
j − (
n1 −
β) <
n*.This implies that the difference between coordinates of corresponding starting point (
n1,
n1 −
β) and ending point (
i,j), i.e.,
I −
n1 and
j −
n1+
β, is not greater than
n* respectively. So,
g1 is computable because not greater
n* pair data are used. Obviously, the first term on the most right hand side (RHS) of Equation (A3) is computable also, which resulted from similar analysis with
g1. In a word, the cumulative Boltzman factor shown as the term on the most left hand side (LHS) of (A3) is computable.
② When
C∈P
2 corresponding the case of
K = 1, we have:
In the expression above, let
=
C(2) + (
n1,
n1−
β) =
C(n1, n1-β+1)→(i,j) + (
n1,
n1−
β).
represents a path starting from (
n1,
n1 −
β) and ending at (
i,
j). We define:
The energy of path
C∈P
2 can be described by:
③ When
C∈P
K+1 corresponding the general case, similarly we have:
Furthermore, we have:
where path
and path set
are defined as follows:
More specifically, in this case,
C(2) represents a path from (
n1,
n1 −
β +
K) to point (
i,j); and
represents a path starting from (
n1,
n1 −
β) and passing through points (
n1,
n1 −
β + 1), (
n1,
n1 –
β + 2),…, (
n1,
n1 −
β +
K −1), (
n1,
n1 –
β +
K) and ending at point (
i,j).
is a path set including all paths like
. Furthermore, we define:
in the definition above represents the path starting from (0,0) and arriving at point (
n1,
n1-
β+
K) first and then passing through (
n1,
n1 –
β +
K + 1), (
n1,
n1 –
β +
K + 2),…, (
n1,
n1 +
β − 1) and ending at (
n1,
n1 +
β). Moreover,
does not include anyone among the points (
n1,
n1 −
β), (
n1,
n1 −
β + 1),…, (
n1,
n1 −
β +
K − 1). According to the definition of
, we have:
Substituting (A8) into the first term of right hand side of Equation (A6), we have:
According to the definition of
, we have
. Let
gK+1 denote the second term of RHS of Equation (A9).
gK+1 should decrease with
K increasing, i.e.,
g2β+1 ≤
g2β ≤ …≤
gK+1 ≤ … ≤
g2 ≤
g1. More specifically,
g1 is the maximum value among {
gK+1,
K = 0,1,…,2
β} and represents the cumulative Boltzman factor corresponding to all path from (
n1,
n1 −
β) to (
i,
j) under the condition |
j −
i| ≤
β and
n* <
i,
j <
n* +
n1 −
β. According to the analysis of case ①,
g1 is computable.
g2β+1 is the minimum value {
gK+1,
K = 0,1,…,2
β} and represents the multiplication of the Boltzman factor corresponding to the path from (
n1,
n1 −
β) to (
n1,
n1 +
β − 1) and the cumulative Boltzman factor corresponding to all path from (
n1,
n1 +
β) to (
i,
j).
g2β+1 is computable also, because
g2β+1 is less than computable
g1. We use the average value of
g1 and
g2β+1, denoted by
ri,j to approximate the values of all
gK+1 (
K = 0,1,…,2
β) for the convenience of following computations and analyses. The third term of right hand side of Equation (A9) is a constant value and can be denoted by
D. We substitute
ri,j and
D to the second and third terms of right hand side of (A9) respectively and we have:
Furthermore, we can obtain the cumulative Boltzman factor corresponding to all path from (0,0) to (
i,
j) by the following expression:
In (A11),
g(
n1,
n1 +
β) denotes the cumulative Boltzman factor corresponding all possible paths from (0,0) to (
n1,
n1 +
β). According to the previous definition of
,
is the path set including all paths starting from (0,0) and ending at (
n1,
n1+β) and
are disjoint sets for different
K. Therefore, the sum of the cumulative Boltzman factor corresponding to each path set
is exactly the cumulative Boltzman factor corresponding all possible paths from (0,0) to (
n1,
n1 +
β) and can surely be denoted by
g(
n1,
n1 +
β). So far, we have approximately expressed the cumulative Boltzman factor from (0,0) to (
i,
j),
g(
i,
j) by the multiplication of the cumulative Boltzman factor from (0,0) to(
n1,
n1 +
β),
g(
n1,
n1 +
β) and the average value of the cumulative Boltzman factor from (
n1,
n1 +
β) to (
i,
j) and the cumulative Boltzman factor from (
n1,
n1 −
β) to(
i,
j),
ri,j, when
n* <
i,
j <
n* +
n1 −
β. Furthermore, we can compute the average thermal path as follows:
In order to obtain the expression of above and analyze conveniently, we suppose that all i and t − i in (A12) are between n* and n* + n1 − β. In fact, only if the maximum value of independent variable i of g(i,t − i) used in (15) to compute is between n* and n* + n1 − β, the thermal average path can be obtained by (A12). Let us give more detailed explanation. If the maximum value of i in all g(i,t − i) of expression (15) is greater than n*, then the minimum value of i must be greater than n* − β, due to condition of Pruning Algorithm shown by (14). This implies that all possible values of independent variables pairs i and t − i of g(i,t − i) involved in expression (15) are greater than n* − β. It is clear that we have all i and t − i are greater than n* − β also greater than n1 + β, since we suppose n* ≥ n1 + 3β. Thus, g(i,t − i) in that all i and t − i are greater than n1 + β can approximately be described by (A11) and can be determined by expression (A12) correspondingly, based on the previous analyses. It tells that only if exist one term g(i,t − i) used in (15) to compute is unacceptably large, all other related terms g(i,t − i) should take quite big value, therefore all used g(i,t − i) can be described by (A11) and therefore, can be determined by expression (A12).
Now, we can summarize that expression (A12), called Path segmentation algorithm in this paper, can be used to obtain , when the maximum value of independent variable i of g(i,t − i) used in (15) to compute is between n* and n* + n1 − β.
Let us discuss the case when the maximum value of independent variable
i of
g(
i,
t −
i) used in (15) to compute
is greater than
n* +
n1 −
β and less than
n. We change the boundary of segmentation path from {(
t1,
t2):
t1 =
n1,
n1 –
β ≤
t2 ≤
n1 +
β,
t1 and
t2 are integers} to {(
t1,
t2):
t1 = 2
n1, 2
n1 –
β ≤
t2 ≤ 2
n1 +
β,
t1 and
t2 are integers}. This is clear that the computation of
g(
i,
t −
i) with such
i that is greater than
n* +
n1 −
βmay involve
n* or more than
n* pair data, which is likely to result in unacceptably large value of
g(
i,
t −
i), if we choose {(
t1,
t2):
t1 =
n1,
n1 −
β ≤
t2 ≤
n1 +
β,
t1 and
t2 are integers} as the boundary of segmentation path, since
i−(
n1 −
β) >
n* +
n1 −
β − (
n1 −
β) =
n*. In this case, the minimum value of independent variable
i of
g(
i,
t −
i) used in (15) to compute
should be greater than (
n* +
n1 −
β) −
β = n* +
n1 − 2
β, since the difference between the maximum value and minimum value of independent variable
i of
g(
i,
t −
i) is not greater than
β. Furthermore, we have that all possible values of the two independent variables
i and
t −
i of
g(
i,
t −
i) used in (15) to compute
are between
n* +
n1 − 2
β and
n. Similarly to the case when
n* <
i,
j = t –
I <
n* +
n1 −
β, all path starting from (0,0) and ending to (
i,
t −
i) in this case can be divided into two parts, the one from (0,0) to (2
n1,2
n1 −
β +
K) and the another one from (2
n1,2
n1 –
β +
K) to (
i,
t −
i) where
K = 0,1,…,2
β, since
i − (2
n1 + β) >
n* +
n1 − 2
β−(2
n1+β) =
n* − (
n1 + 3
β)>0,
t −
i − (2
n1 +
β) >
n* − (
n1 + 3
β) > 0. Correspondingly, we have:
where
ri,j is the average value of the cumulative Boltzman factor from (2
n1,2
n1 +
β) to (
i,
j) and the cumulative Boltzman factor from (2
n1,2
n1 −
β) to (
i,
j); and
D is a constant equal to
. Furthermore, we have:
Finally, we can summarize that expression (A14), also called Path segmentation algorithm in this paper, can be used to obtain , when the maximum value of independent variable i of g(i,t − i) used in (15) to compute is between n* + n1 – β and n. The appendix is an optional section that can contain details and data supplemental to the main text. For example, explanations of experimental details that would disrupt the flow of the main text, but nonetheless remain crucial to understanding and reproducing the research shown; figures of replicates for experiments of which representative data is shown in the main text can be added here if brief, or as Supplementary data. Mathematical proofs of results not central to the paper can be added as an appendix.