Comparison Between Bayesian and Maximum Entropy Analyses of Flow Networks†

Waldrip, Steven H.; Niven, Robert K.

doi:10.3390/e19020058

Open AccessArticle

Comparison Between Bayesian and Maximum Entropy Analyses of Flow Networks†

by

Steven H. Waldrip

and

Robert K. Niven

^*

School of Engineering and Information Technology, The University of New South Wales, Canberra, ACT 2600, Australia

^*

Author to whom correspondence should be addressed.

Entropy 2017, 19(2), 58; https://doi.org/10.3390/e19020058

Submission received: 22 December 2016 / Accepted: 22 January 2017 / Published: 2 February 2017

Download Versions Notes

Abstract

:

We compare the application of Bayesian inference and the maximum entropy (MaxEnt) method for the analysis of flow networks, such as water, electrical and transport networks. The two methods have the advantage of allowing a probabilistic prediction of flow rates and other variables, when there is insufficient information to obtain a deterministic solution, and also allow the effects of uncertainty to be included. Both methods of inference update a prior to a posterior probability density function (pdf) by the inclusion of new information, in the form of data or constraints. The MaxEnt method maximises an entropy function subject to constraints, using the method of Lagrange multipliers, to give the posterior, while the Bayesian method finds its posterior by multiplying the prior with likelihood functions incorporating the measured data. In this study, we examine MaxEnt using soft constraints, either included in the prior or as probabilistic constraints, in addition to standard moment constraints. We show that when the prior is Gaussian, both Bayesian inference and the MaxEnt method with soft prior constraints give the same posterior means, but their covariances are different. In the Bayesian method, the interactions between variables are applied through the likelihood function, using second or higher-order cross-terms within the posterior pdf. In contrast, the MaxEnt method incorporates interactions between variables using Lagrange multipliers, avoiding second-order correlation terms in the posterior covariance. The MaxEnt method with soft prior constraints, therefore, has a numerical advantage over Bayesian inference, in that the covariance terms are avoided in its integrations. The second MaxEnt method with soft probabilistic constraints is shown to give posterior means of similar, but not identical, structure to the other two methods, due to its different formulation.

Keywords:

maximum entropy analysis; Bayesian inference; probability; flows; networks

1. Introduction

The analysis of flow rates on networks is required for the design and monitoring of electrical, water, sewer, irrigation, fire suppression, drainage, oil, gas and any other networks through which fluids or energy are transported. Their analysis is an important engineering problem. Traditionally, these systems have been analysed using deterministic methods. These methods incorporate physical laws such as Kirchhoff’s first and second laws (conservation of mass and equivalence of potentials at nodes) and sufficient known parameter values, giving a closed-form set of equations which is solved for the (deterministic) solution. Deterministic methods yield precise parameter values but do not consider uncertainty, either due to a lack of knowledge of the state of the system or flow variability. To account for uncertainty, a probabilistic framework is required. There are two primary methods for probabilistic inference: Bayesian inference using Bayes’ rule, and maximum entropy (MaxEnt) analysis.

Bayes’ theorem comes from the product rule of probabilities. To use Bayes’ theorem, the prior and likelihood functions need to be chosen before the data are analysed. To analyse the data, a set of data values are incorporated in the likelihood function, which is then updated by Bayes’ rule to obtain the posterior. This process can be repeated for each data set by using the posterior as the prior for the next data set. The order in which the data sets are analysed should not impact the final result.

The analysis presented here follows the well-known Bayesian parameter estimation or regression procedure as described in [1]. Although the applications of this procedure are extremely vast, including least-squares regression, the authors are not aware of it being applied in its general form to estimate flows on a network. An example of using Bayes’ theorem with transient pipe flows is presented in Rougier and Goldstein [2] who solve the water hammer partial differential equations, incorporating uncertainty from the pipeline characteristics and the boundary conditions. Bayes’ theorem is used to estimate the flows, pressures and pipeline characteristics as time progresses, using data obtained through real-time monitoring of the pipeline in a few locations. As this method requires the solution of a partial differential equation which incorporates time and uncertainty, its computational cost is high and therefore is restricted to small networks; in the example, a single pipe is analysed. The Bayesian method can also be used to calibrate model parameters, often using least-squares regression. Savic et al. [3] provide a comprehensive review of calibration techniques used with water networks. As an alternative to predicting model parameters, Hutton et al. [4] use Bayes’ theorem to update the coefficients in an autoregressive, data-driven model to predict future flow rates (at two locations in their example) using current and previous flow rate observations. In their case study, they were able to provide accurate one-hour forecasts for the monitored locations. Hutton and Kapelan [5] extended their previous analysis to predict pipe bursts by considering the difference between their predicted and observed flow rates. They were able to detect abnormal flow conditions representing pipe bursts greater than 5% of normal flow conditions.

Entropy is a measure of uncertainty [6,7,8,9,10]. The MaxEnt method for inference can be derived from an axiomatic approach based on the axioms of locality, coordinate invariance and subsystem independence [11,12,13]. Alternatively the MaxEnt method can be derived from a combinatorial approach [14,15,16,17], which shows that the MaxEnt method infers the most probable distribution, subject to the constraints and prior [6,14,15,16,17,18,19,20]. The maximum relative entropy method (MaxEnt), equivalent to the minimum Kullback–Leibler divergence [21], is a method of inference used to infer or update a probability distribution describing an under-determined system, which respects all constraints imposed on the system and is closest to the prior distribution [8]. However, the MaxEnt method is a method of inference, with no guarantee that the inferred solution will be realised [6,8,9]. The validity of the distribution will depend on the assumptions used to construct the MaxEnt model. The MaxEnt method has been applied to predict the flows on water distribution networks [22,23,24,25], transportation networks [26], electrical networks [26] and generic flow networks [27,28,29].

There have been many studies on the connection between Bayes’ theorem and the MaxEnt method, with some authors suggesting that one can be obtained from the other (in either direction) [30,31,32,33]. Giffin and Caticha’s method [31,32,33] to obtain Bayes’ rule using MaxEnt requires the relative entropy function to be defined over the model parameters and the data. The normalisation constraint is applied, and the variables representing the data are constrained with a pdf (probability density function) representing the Bayesian likelihood function. The pdf constraint is applied to the parameters defined to be the prior in the Bayesian method. Bayes’ rule is obtained by dividing the inferred distribution by the pdf defined over the data parameters, i.e., dividing by the pdf constraint. Although this equivalence is mathematically correct, the second constraint appears somewhat contrived as it is applied over the parameters of the Bayesian prior.

The current authors have compared the probability distributions of quasi-Newton rules obtained when inferring the Jacobian or Hessian using Bayesian inference [34,35] and the MaxEnt method [36]. In both methods, the same Gaussian prior was used. In the MaxEnt method, secant equations were used as the constraints. In Bayes’ method, delta likelihood functions incorporating the secant equation were used to represent the data. It was found that both methods obtained the same posterior means, but the covariance matrices were found to be different.

In this study, we develop a Bayesian method to analyse flow networks (Section 2). This theory contains many features in common with the MaxEnt method of [25]. In Section 3.1, we present a MaxEnt theory using soft constraints that are implemented in the prior pdf. In Section 3.2, we compare the distributions obtained by the two methods. In Section 4.1, we also present a MaxEnt theory with soft constraints implemented using pdfs as constraints, and, in Section 4.2, we compare this to the Bayesian method. Finally, in Section 5, we discuss our findings.

2. Bayesian Analysis

Consider a flow network with N external flow rates and M internal flow rates, which can be assembled into the vectors Θ and

Q

, respectively, which, in turn, can be assembled into the vector

Ψ = [\begin{matrix} Θ \\ Q \end{matrix}],

(1)

In the Bayesian method, to avoid inconsistencies due to different network representations, we consider a basis set

X

of n flow rates selected from Ψ as parameters of the pdf used to represent the uncertainty. The indices of the basis set

X

in Ψ are given by the set

B

, while the indices of the complementary non-basis set of flow rates in Ψ are given by

N

. For closure, at least

N - 1

basis flow rates must be chosen but up to

N + M

can be chosen. The derivation of the Bayesian method requires a prior belief of the state of the system, represented as a prior pdf

q (X

), which is updated using observed data to a posterior pdf according to Bayes’ rule:

p (X | y) = \frac{p (y | X) q (X)}{\underset{Ω}{\int \dots \int} p (y | X) q (X) d X}'

(2)

where

p (y | X)

is the likelihood function, the denominator allows for normalisation,

X

is the basis set of flow rates,

y

is the vector of observed data and Ω is the domain of

X

. The flow rates

\bar{X}

not included in the basis set are taken as functions of the model parameters

X

, using:

\bar{X} = V X,

(3)

where

V = - A_{i \in V, j \in N}^{- 1} A_{i \in V, j \in B},

(4)

A = [\begin{matrix} C \\ W diag (K) \\ F \\ T diag (K) \end{matrix}],

(5)

in which

diag() places the elements of a vector on the diagonal of a square matrix of zeros;
the set $V$ contains the $N + M - n$ indices of the equations required to uniquely define $\bar{X}$ from $X$ ;
the matrix $C$ is an $N \times (N + M)$ connectivity matrix containing elements ${- 1, 0, 1}$ . Its entries $C_{i, r}, \forall i \in {1, \dots N}$ indicate membership of edge r to the node i, given by 0 if the edge is not connected to the node, 1 if the assumed direction of $Q_{m}$ or $Θ_{i}$ is entering the node and $- 1$ otherwise;
the vector $K$ is an $(M + N) \times 1$ vector of flow resistances;
the matrix $W$ is a $w \times (N + M)$ loop matrix containing elements ${- 1, 0, 1}$ , where w is the number of independent cycles (loops) within the network. Its entries $W_{i, r}, \forall i \in {1, \dots, w}$ indicate membership of edge r within loop i, given by 0 if the edge is not in the loop, 1 if the assumed direction of $Q_{m}$ is in a clockwise direction around the loop and $- 1$ otherwise;
the matrix $F$ is a $N_{\hat{Θ}} + N_{\hat{Q}} \times N + M$ matrix containing either 0 or 1 in each of its elements. Each row will have a single 1 on the index corresponding to the dimension of the observed link, with the remaining elements set to 0;
$N_{\hat{Θ}}$ and $N_{\hat{Q}}$ are the numbers of flow rate observation locations for flows entering/exiting or within the network respectively; and
the matrix $T$ is a $h_{c} \times (M + N)$ pseudo-loop matrix containing ${- 1, 0, 1}$ , where $h_{c}$ is the number of potential difference constraints applied. The pseudo-loop matrix contains paths between nodes of known pressure or potential values. For convenience, these are referenced to the potential at a single reference node $H_{0}$ ; this gives $〈 Y_{T} 〉$ as the $h_{c} \times 1$ vector of mean potential differences between $H_{0}$ and $H_{j}$ , for all nodes with potential observations. The entries in $T_{i, r}, \forall i \in {1, \dots, h_{c}}$ indicate membership of edge r within the potential difference constraint index i, given by 0 if the edge is not in the constraint, 1 if the assumed direction of $Q_{m}$ is defined as in the direction from node 0 to node j, and $- 1$ otherwise.

The prior is chosen to represent one’s belief of the system state before incorporating any measured data. Although any distribution which represents what is believed about the system state could be chosen, in this study, a multidimensional Gaussian distribution is selected, defined over the real domain:

q (X) = \frac{\exp (- \frac{1}{2} {(X - m)}^{⊤} Σ^{- 1} (X - m))}{{(2 π)}^{\frac{n}{2}} {|Σ|}^{\frac{1}{2}}},

(6)

where

m

is the

n \times 1

vector of prior mean flow rates and Σ is the

n \times n

matrix of prior covariances.

In Bayes’ method, likelihood functions are used to incorporate the physics of the system as well as any observed data, as follows:

The likelihood function to incorporate conservation of mass at each node or Kirchhoff’s first law (or the flow rate for incompressible systems) is given by a delta function

$p (0 | X) = δ (0 - (C_{X} + C_{\bar{X}}) X),$

(7)

where

$C_{X} = C_{i \notin V, j \in B},$

(8)

$C_{\bar{X}} = C_{i \notin V, j \in N} V .$

(9)

This delta function is defined by the limit of a Gaussian distribution

$- 2 \ln (p (0 | X)) \propto \lim_{Σ_{C} \to 0} {(0 - (C_{X} + C_{\bar{X}}) X)}^{⊤} Σ_{C}^{- 1} (0 - (C_{X} + C_{\bar{X}}) X) .$

(10)
The likelihood function to incorporate the loop laws for each loop, Kirchhoff’s second law, is given by a delta function

$p (0 | X) = δ (0 - (W_{X} + W_{\bar{X}}) X),$

(11)

where

$W_{X} = W_{i \notin V, j \in B} diag (K_{i \in B}),$

(12)

$W_{\bar{X}} = W_{i \notin V, j \in N} diag (K_{i \in N}) V .$

(13)

This delta function is defined by the limit of the Gaussian distribution

$- 2 \ln (p (0 | X)) \propto \lim_{Σ_{W} \to 0} {(0 - (W_{X} + W_{\bar{X}}) X)}^{⊤} Σ_{W}^{- 1} (0 - (W_{X} + W_{\bar{X}}) X) .$

(14)
Observed flow rates can be constrained using the likelihood function

$- 2 \ln (p (Y_{F} | X)) \propto {(Y_{F} - (F_{X} + F_{\bar{X}}) X)}^{⊤} Σ_{F}^{- 1} (Y_{F} - (F_{X} + F_{\bar{X}}) X),$

(15)

where $Y_{F}$ is a $N_{\hat{Θ}} + N_{\hat{Q}} \times 1$ vector that has the flow rate of each observation for a link in its elements, $Σ_{F}$ is the $N_{\hat{Θ}} + N_{\hat{Q}} \times N_{\hat{Θ}} + N_{\hat{Q}}$ covariance matrix of the observations and

$F_{X} = F_{i \notin V, j \in B},$

(16)

$F_{\bar{X}} = F_{i \notin V, j \in N} V .$

(17)
Observed potential differences can be constrained with the likelihood function

$- 2 \ln (p (Y_{T} | X)) \propto {(Y_{T} - (T_{X} + T_{\bar{X}}) X)}^{⊤} Σ_{T}^{- 1} (Y_{T} - (T_{X} + T_{\bar{X}}) X),$

(18)

where $Y_{T}$ is a $h_{c} \times 1$ vector that has the potential difference of an observation between two points in each element, $Σ_{T}$ is the $h_{c} \times h_{c}$ covariance matrix of the observations and

$T_{X} = T_{i \notin V, j \in B} diag (K_{i \in B}),$

(19)

$T_{\bar{X}} = T_{i \notin V, j \in N} diag (K_{i \in N}) V .$

(20)

Applying Bayes’ rule with each of the likelihood functions, and expanding and dropping all terms which are not functions of

X

gives the posterior in the form

- 2 \ln (p (X | y)) \propto X^{⊤} Σ^{- 1} X - X^{⊤} Σ^{- 1} m - m^{⊤} Σ^{- 1} X + X^{⊤} O^{⊤} S^{- 1} O X - y^{⊤} S^{- 1} O X - X^{⊤} O^{⊤} S^{- 1} y,

(21)

where

O = [\begin{matrix} C_{X} + C_{\bar{X}} \\ W_{X} + W_{\bar{X}} \\ F_{X} + F_{\bar{X}} \\ T_{X} + T_{\bar{X}} \end{matrix}],

(22)

S^{- 1} = [\begin{matrix} Σ_{C}^{- 1} & 0 & 0 & 0 \\ 0 & Σ_{W}^{- 1} & 0 & 0 \\ 0 & 0 & Σ_{F}^{- 1} & 0 \\ 0 & 0 & 0 & Σ_{T}^{- 1} \end{matrix}],

(23)

y = [\begin{matrix} 0 \\ 0 \\ Y_{F} \\ Y_{T} \end{matrix}] .

(24)

Combining like factors gives

- 2 \ln (p (X | y)) \propto X^{⊤} (Σ^{- 1} + O^{⊤} S^{- 1} O) X - X^{⊤} (Σ^{- 1} m + O^{⊤} S^{- 1} y) - (m^{⊤} Σ^{- 1} + y^{⊤} S^{- 1} O) X .

(25)

Completing the square gives

- 2 \ln (p (X | y)) \propto {(X - 〈 X 〉)}^{⊤} Σ_{p}^{- 1} (X - 〈 X 〉),

(26)

where the mean flow rates and variance matrix are given by

〈 X 〉 = Σ_{p} (Σ^{- 1} m + O^{⊤} S^{- 1} y),

(27)

Σ_{p} = {(Σ^{- 1} + O^{⊤} S^{- 1} O)}^{- 1} .

(28)

Using the Woodbury matrix identity [37] to find the posterior covariance gives

Σ_{p} = Σ - Σ O^{⊤} {(S + O Σ O^{⊤})}^{- 1} O Σ .

(29)

The following algebra is needed to find a form which does not require an inversion of a zero matrix arising from the delta functions. Right multiplying the inverse posterior covariance by

Σ O^{⊤}

gives

Σ_{P}^{- 1} Σ O^{⊤} = O^{⊤} + O^{⊤} S^{- 1} O Σ O^{⊤} = O^{⊤} S^{- 1} (S + O Σ O^{⊤}),

(30)

then left multiplying with the posterior covariance

Σ O^{⊤} = Σ_{p} O^{⊤} S^{- 1} (S + O Σ O^{⊤}) .

(31)

Extracting

Σ_{p} O^{⊤} S^{- 1}

by right multiplying by

{(S + O Σ O^{⊤})}^{- 1}

gives

Σ O^{⊤} {(S + O Σ O^{⊤})}^{- 1} = Σ_{p} O^{⊤} S^{- 1} .

(32)

The posterior mean flow rates can then be found from Equation (27) by substituting Equation (29) and Equation (32) to give

〈 X 〉 = m + Σ O^{⊤} {(S + O Σ O^{⊤})}^{- 1} (y - O m) .

(33)

3. MaxEnt Analysis with Soft Constraints Implemented in the Prior

3.1. Formulation

The maximum entropy method is defined by the following algorithm [6,9]: (i) define a probability measure over the uncertainties of interest; (ii) construct a relative entropy function; (iii) define a prior probability function and constraints; (iv) maximise the entropy subject to the constraints and prior, to infer the probability distribution which describes the system; and, if desired, (v) extract statistical moments of quantities of interest. Soft MaxEnt constraints have previously been suggested by the authors [24,25] but have not been formally derived. To implement soft constraints, we define a pdf which expresses the uncertainty in the system defined over a reduced parameter set

X

, consisting of a basis set of n flow rates selected from Ψ and also the parameter observations

Y_{F}

and

Y_{T}

. The indices of

X

in Ψ are again given by the set

B

. Again for closure, at least

N - 1

basis flow rates must be chosen but up to

N + M

can be chosen. The joint probability is defined to be:

p (X) d X = Prob (X \leq Υ_{X} \leq X + d X, Y_{F} \leq Υ_{Y_{F}} \leq Y_{F} + d Y_{F}, Y_{T} \leq Υ_{Y_{T}} \leq Y_{T} + d Y_{T}),

(34)

where

Υ_{X}

,

Υ_{Y_{F}}

and

Υ_{Y_{T}}

are the vectors of the random variables for

X

,

Y_{F}

and

Y_{T}

, respectively. We also assume that each of the flow rate and potential difference constraints are applied as soft constraints, but this does not restrict this method to only use soft constraints, and strict constraints defined by expectations can still be applied. This choice of pdf gives the following relative entropy or negative Kullback–Leibler function [21], over the space of uncertainties used in this formulation:

H = - \int_{l_{1}}^{u_{1}} \dots \int_{l_{n + n_{o}}}^{u_{n + n_{o}}} p (X, Y_{F}, Y_{T}) \ln \frac{p (X, Y_{F}, Y_{T})}{q (X, Y_{F}, Y_{T})} d X d Y_{F} d Y_{T},

(35)

where

n_{o} = N_{\hat{Θ}} + N_{\hat{Q}} + h_{c}

, the number of data observations,

q (X, Y_{F}, Y_{T})

is the prior pdf, and

l_{i}

and

u_{i}

are the lower and upper bounds of the

i th

flow rate. The relative entropy is then maximised subject to the constraints on the system. The following constraints are always required:

Normalisation of probability:

$1 = \int_{l_{1}}^{u_{1}} \dots \int_{l_{n + n_{o}}}^{u_{n + n_{o}}} p (X, Y_{F}, Y_{T}) d X d Y_{F} d Y_{T} .$

(36)
Kirchhoff’s first law, for the conservation of flow rates at each internal node, here imposed in the mean:

$0 = (C_{X} + C_{\bar{X}}) (\int_{l_{1}}^{u_{1}} \dots \int_{l_{n + n_{o}}}^{u_{n + n_{o}}} X p (X, Y_{F}, Y_{T}) d X d Y_{F} d Y_{T}) .$

(37)
Kirchhoff’s second law, which requires the potential difference to vanish around each enclosed loop, again imposed in the mean:

$0 = (W_{X} + W_{\bar{X}}) (\int_{l_{1}}^{u_{1}} \dots \int_{l_{n + n_{o}}}^{u_{n + n_{o}}} X p (X, Y_{F}, Y_{T}) d X d Y_{F} d Y_{T}) .$

(38)

We also allow for any of the following constraints:

A set of specified inflow/outflow and internal flow rate constraints:

$0 = \int_{l_{1}}^{u_{1}} \dots \int_{l_{n + n_{o}}}^{u_{n + n_{o}}} ((F_{X} + F_{\bar{X}}) X - Y_{F}) p (X, Y_{F}, Y_{T}) d X d Y_{F} d Y_{T} .$

(39)
Potential difference constraints between pairs of nodes:

$0 = \int_{l_{1}}^{u_{1}} \dots \int_{l_{n + n_{o}}}^{u_{n + n_{o}}} ((T_{X} + T_{\bar{X}}) X - Y_{T}) p (X, Y_{F}, Y_{T}) d X d Y_{F} d Y_{T} .$

(40)

After identifying the constraints, the entropy (35) is then maximised subject to Equations (36)–(38) and whichever of Equations (39) and (40) apply. Applying the calculus of variations, we form the Lagrangian:

\begin{matrix} L = & - \int_{l_{1}}^{u_{1}} \dots \int_{l_{n + n_{o}}}^{u_{n + n_{o}}} p (X, Y_{F}, Y_{T}) \ln \frac{p (X, Y_{F}, Y_{T})}{q (X, Y_{F}, Y_{T})} d X d Y_{F} d Y_{T} - \hat{κ} (\int_{l_{1}}^{u_{1}} \dots \int_{l_{n + n_{o}}}^{u_{n + n_{o}}} p (X, Y_{F}, Y_{T}) d X d Y_{F} d Y_{T} - 1) \\ - α ((C_{X} + C_{\bar{X}}) (\int_{l_{1}}^{u_{1}} \dots \int_{l_{n + n_{o}}}^{u_{n + n_{o}}} X p (X, Y_{F}, Y_{T}) d X d Y_{F} d Y_{T})) \\ - β ((W_{X} + W_{\bar{X}}) (\int_{l_{1}}^{u_{1}} \dots \int_{l_{n + n_{o}}}^{u_{n + n_{o}}} X p (X, Y_{F}, Y_{T}) d X d Y_{F} d Y_{T})) \\ - λ (\int_{l_{1}}^{u_{1}} \dots \int_{l_{n + n_{o}}}^{u_{n + n_{o}}} ((F_{X} + F_{\bar{X}}) X - Y_{F}) p (X, Y_{F}, Y_{T}) d X d Y_{F} d Y_{T}) \\ - η (\int_{l_{1}}^{u_{1}} \dots \int_{l_{n + n_{o}}}^{u_{n + n_{o}}} ((T_{X} + T_{\bar{X}}) X - Y_{T}) p (X, Y_{F}, Y_{T}) d X d Y_{F} d Y_{T}), \end{matrix}

(41)

where

\hat{κ}

, (scalar)

α

,

β

,

λ

and

η

(row vectors) are the Lagrange multipliers for the normalisation, Kirchhoff’s first and second laws, flow rates and the head loss constraints, respectively. The variation of

L

is given by

δ L = 0

. Extremizing Equation (41) by taking the functional derivative with respect to

p (X)

and combining integrals gives:

\begin{matrix} δ L = 0 = \int_{l_{1}}^{u_{1}} \dots \int_{l_{n + n_{o}}}^{u_{n + n_{o}}} [- \ln \frac{p (X, Y_{F}, Y_{T})}{q (X, Y_{F}, Y_{T})} - κ - α (C_{X} + C_{\bar{X}}) X - β (W_{X} + W_{\bar{X}}) X \\ - λ ((F_{X} + F_{\bar{X}}) X - Y_{F}) - η ((T_{X} + T_{\bar{X}}) X - Y_{T})] d X, \end{matrix}

(42)

where

κ = \hat{κ} + 1 .

Rearrangement gives the following solution for

p (X)

(the Boltzmann distribution):

p^{*} (X, Y_{F}, Y_{T}) = q (X, Y_{F}, Y_{T}) e^{- κ - α (C_{X} + C_{\bar{X}}) X - β (W_{X} + W_{\bar{X}}) X - λ ((F_{X} + F_{\bar{X}}) X - Y_{F}) - η ((T_{X} + T_{\bar{X}}) X - Y_{T})} .

(43)

This can be solved, in conjunction with the constraints (36)–(40), to give

p^{*} (X, Y_{F}, Y_{T})

and the Lagrange multipliers κ,

α

,

β

,

λ

and

η

.

3.2. Solution and Comparison to Bayesian Solution

In the MaxEnt method, we choose the prior pdf as the multidimensional Gaussian

q (X, Y_{F}, Y_{T}) \propto \exp (- \frac{1}{2} {([\begin{matrix} X \\ Y_{F} \\ Y_{T} \end{matrix}] - [\begin{matrix} m \\ m_{F} \\ m_{T} \end{matrix}])}^{⊤} [\begin{matrix} Σ^{- 1} & 0 & 0 \\ 0 & Σ_{F}^{- 1} & 0 \\ 0 & 0 & Σ_{T}^{- 1} \end{matrix}] ([\begin{matrix} X \\ Y_{F} \\ Y_{T} \end{matrix}] - [\begin{matrix} m \\ m_{F} \\ m_{T} \end{matrix}])),

(44)

where m (

n \times 1

vector) and Σ (

n \times n

matrix) are the mean and covariance of the prior flow rates within the entropy function,

m_{F}

(

N_{\hat{Θ}} + N_{\hat{Q}} \times 1

vector) and

m_{T}

(

h_{c} \times 1

vector) are the values of the observations of the flow rate and potential differences, respectively, and

Σ_{F}

(

N_{\hat{Θ}} + N_{\hat{Q}} \times N_{\hat{Θ}} + N_{\hat{Q}}

matrix) and

Σ_{F}

(

h_{c} \times h_{c}

matrix) are their respective covariances. The resulting MaxEnt pdf with normalisation, Kirchhoff’s first and second law, potential difference and flow rate constraints is proportional to

\begin{matrix} \ln (p^{*} (X, Y_{F}, Y_{T})) \propto - \frac{1}{2} ({[\begin{matrix} X \\ Y_{F} \\ Y_{T} \end{matrix}]}^{⊤} [\begin{matrix} Σ^{- 1} & 0 & 0 \\ 0 & Σ_{F}^{- 1} & 0 \\ 0 & 0 & Σ_{T}^{- 1} \end{matrix}] [\begin{matrix} X \\ Y_{F} \\ Y_{T} \end{matrix}] - {[\begin{matrix} X \\ Y_{F} \\ Y_{T} \end{matrix}]}^{⊤} [\begin{matrix} Σ^{- 1} & 0 & 0 \\ 0 & Σ_{F}^{- 1} & 0 \\ 0 & 0 & Σ_{T}^{- 1} \end{matrix}] [\begin{matrix} m \\ m_{F} \\ m_{T} \end{matrix}] \\ - {[\begin{matrix} m \\ m_{F} \\ m_{T} \end{matrix}]}^{⊤} [\begin{matrix} Σ^{- 1} & 0 & 0 \\ 0 & Σ_{F}^{- 1} & 0 \\ 0 & 0 & Σ_{T}^{- 1} \end{matrix}] [\begin{matrix} X \\ Y_{F} \\ Y_{T} \end{matrix}]) - γ^{⊤} ([\begin{matrix} C_{X} + C_{\bar{X}} & 0 & 0 \\ W_{X} + W_{\bar{X}} & 0 & 0 \\ F_{X} + F_{\bar{X}} & - I & 0 \\ T_{X} + T_{\bar{X}} & 0 & - I \end{matrix}] [\begin{matrix} X \\ Y_{F} \\ Y_{T} \end{matrix}]), \end{matrix}

(45)

where

γ = {[\begin{matrix} α & β & λ & η \end{matrix}]}^{⊤}

. Combining terms of the same order and assuming the covariance is symmetric and positive definite

\begin{matrix} \ln (p^{*} (X, Y_{F}, Y_{T})) \propto - \frac{1}{2} {[\begin{matrix} X \\ Y_{F} \\ Y_{T} \end{matrix}]}^{⊤} [\begin{matrix} Σ^{- 1} & 0 & 0 \\ 0 & Σ_{F}^{- 1} & 0 \\ 0 & 0 & Σ_{T}^{- 1} \end{matrix}] [\begin{matrix} X \\ Y_{F} \\ Y_{T} \end{matrix}] - ({[\begin{matrix} m \\ m_{F} \\ m_{T} \end{matrix}]}^{⊤} [\begin{matrix} Σ^{- 1} & 0 & 0 \\ 0 & Σ_{F}^{- 1} & 0 \\ 0 & 0 & Σ_{T}^{- 1} \end{matrix}] \\ + γ^{⊤} [\begin{matrix} C_{X} + C_{\bar{X}} & 0 & 0 \\ W_{X} + W_{\bar{X}} & 0 & 0 \\ F_{X} + F_{\bar{X}} & - I & 0 \\ T_{X} + T_{\bar{X}} & 0 & - I \end{matrix}]) [\begin{matrix} X \\ Y_{F} \\ Y_{T} \end{matrix}] \end{matrix}

(46)

and completing the square

\begin{matrix} \ln (p^{*} (X, Y_{F}, Y_{T})) \propto - \frac{1}{2} {([\begin{matrix} X \\ Y_{F} \\ Y_{T} \end{matrix}] - [\begin{matrix} m \\ m_{F} \\ m_{T} \end{matrix}] + [\begin{matrix} Σ & 0 & 0 \\ 0 & Σ_{F} & 0 \\ 0 & 0 & Σ_{T} \end{matrix}] {[\begin{matrix} C_{X} + C_{\bar{X}} & 0 & 0 \\ W_{X} + W_{\bar{X}} & 0 & 0 \\ F_{X} + F_{\bar{X}} & - I & 0 \\ T_{X} + T_{\bar{X}} & 0 & - I \end{matrix}]}^{⊤} γ)}^{⊤} \\ \times Σ^{- 1} ([\begin{matrix} X \\ Y_{F} \\ Y_{T} \end{matrix}] - [\begin{matrix} m \\ m_{F} \\ m_{T} \end{matrix}] + [\begin{matrix} Σ & 0 & 0 \\ 0 & Σ_{F} & 0 \\ 0 & 0 & Σ_{T} \end{matrix}] {[\begin{matrix} C_{X} + C_{\bar{X}} & 0 & 0 \\ W_{X} + W_{\bar{X}} & 0 & 0 \\ F_{X} + F_{\bar{X}} & - I & 0 \\ T_{X} + T_{\bar{X}} & 0 & - I \end{matrix}]}^{⊤} γ) . \end{matrix}

(47)

The above form allows the mean to be obtained as

[\begin{matrix} 〈 X 〉 \\ 〈 Y_{F} 〉 \\ 〈 Y_{T} 〉 \end{matrix}] = \int_{- \infty}^{\infty} \dots \int_{- \infty}^{\infty} [\begin{matrix} X \\ Y_{F} \\ Y_{T} \end{matrix}] p (X, Y_{F}, Y_{T}) d X d Y_{F} d Y_{T} = [\begin{matrix} m \\ m_{F} \\ m_{T} \end{matrix}] - [\begin{matrix} Σ & 0 & 0 \\ 0 & Σ_{F} & 0 \\ 0 & 0 & Σ_{T} \end{matrix}] {[\begin{matrix} C_{X} + C_{\bar{X}} & 0 & 0 \\ W_{X} + W_{\bar{X}} & 0 & 0 \\ F_{X} + F_{\bar{X}} & - I & 0 \\ T_{X} + T_{\bar{X}} & 0 & - I \end{matrix}]}^{⊤} γ .

(48)

Using the constraint equations, the Lagrange multipliers can be found from

γ = {([\begin{matrix} C_{X} + C_{\bar{X}} & 0 & 0 \\ W_{X} + W_{\bar{X}} & 0 & 0 \\ F_{X} + F_{\bar{X}} & - I & 0 \\ T_{X} + T_{\bar{X}} & 0 & - I \end{matrix}] [\begin{matrix} Σ & 0 & 0 \\ 0 & Σ_{F} & 0 \\ 0 & 0 & Σ_{T} \end{matrix}] {[\begin{matrix} C_{X} + C_{\bar{X}} & 0 & 0 \\ W_{X} + W_{\bar{X}} & 0 & 0 \\ F_{X} + F_{\bar{X}} & - I & 0 \\ T_{X} + T_{\bar{X}} & 0 & - I \end{matrix}]}^{⊤})}^{- 1} ([\begin{matrix} C_{X} + C_{\bar{X}} & 0 & 0 \\ W_{X} + W_{\bar{X}} & 0 & 0 \\ F_{X} + F_{\bar{X}} & - I & 0 \\ T_{X} + T_{\bar{X}} & 0 & - I \end{matrix}] [\begin{matrix} m \\ m_{F} \\ m_{T} \end{matrix}]) .

(49)

Substituting Equation (49) into Equation (48) gives

\begin{matrix} [\begin{matrix} 〈 X 〉 \\ 〈 Y_{F} 〉 \\ 〈 Y_{T} 〉 \end{matrix}] = \int_{- \infty}^{\infty} \dots \int_{- \infty}^{\infty} [\begin{matrix} X \\ Y_{F} \\ Y_{T} \end{matrix}] p (X, Y_{F}, Y_{T}) d X d Y_{F} d Y_{T} = [\begin{matrix} m \\ m_{F} \\ m_{T} \end{matrix}] - [\begin{matrix} Σ & 0 & 0 \\ 0 & Σ_{F} & 0 \\ 0 & 0 & Σ_{T} \end{matrix}] {[\begin{matrix} C_{X} + C_{\bar{X}} & 0 & 0 \\ W_{X} + W_{\bar{X}} & 0 & 0 \\ F_{X} + F_{\bar{X}} & - I & 0 \\ T_{X} + T_{\bar{X}} & 0 & - I \end{matrix}]}^{⊤} \\ \times {([\begin{matrix} C_{X} + C_{\bar{X}} & 0 & 0 \\ W_{X} + W_{\bar{X}} & 0 & 0 \\ F_{X} + F_{\bar{X}} & - I & 0 \\ T_{X} + T_{\bar{X}} & 0 & - I \end{matrix}] [\begin{matrix} Σ & 0 & 0 \\ 0 & Σ_{F} & 0 \\ 0 & 0 & Σ_{T} \end{matrix}] {[\begin{matrix} C_{X} + C_{\bar{X}} & 0 & 0 \\ W_{X} + W_{\bar{X}} & 0 & 0 \\ F_{X} + F_{\bar{X}} & - I & 0 \\ T_{X} + T_{\bar{X}} & 0 & - I \end{matrix}]}^{⊤})}^{- 1} ([\begin{matrix} C_{X} + C_{\bar{X}} & 0 & 0 \\ W_{X} + W_{\bar{X}} & 0 & 0 \\ F_{X} + F_{\bar{X}} & - I & 0 \\ T_{X} + T_{\bar{X}} & 0 & - I \end{matrix}] [\begin{matrix} m \\ m_{F} \\ m_{T} \end{matrix}]) . \end{matrix}

(50)

Extracting the posterior means then gives

〈 X 〉 = m + Σ O^{⊤} {([\begin{matrix} 0 & 0 & 0 \\ 0 & Σ_{F} & 0 \\ 0 & 0 & Σ_{T} \end{matrix}] + O Σ O^{⊤})}^{- 1} ([\begin{matrix} 0 \\ m_{F} \\ m_{T} \end{matrix}] - O m) .

(51)

Applying the limit to

Σ_{C}

and

Σ_{W}

gives Equation (33). In consequence, the MaxEnt formulation with soft prior constraints Section 3.1 and the Bayesian formulation Section 2 give the same mean flow rate prediction (33), but with different covariance matrices.

4. MaxEnt Analysis with Soft Probabilistic Constraints

4.1. Formulation

The MaxEnt method can also incorporate soft constraints using a probabilistic representation of the observed data. To implement this, we define a pdf that expresses the uncertainty in the system defined over a reduced parameter set

X

, again consisting of a basis set of n flow rates selected from Ψ. The indices of

X

in Ψ are again given by the set

B

, while their complement is given by the set

N

. Again, at least

N - 1

basis flow rates must be chosen but up to

N + M

can be chosen. The joint probability is defined to be:

p (X) d X = Prob (X \leq Υ_{X} \leq X + d X),

(52)

where

Υ_{X}

is the vectors of the random variables for

X

. We again assume that each of the flow rate and potential difference constraints are applied as soft constraints, but this does not restrict this method to only use soft constraints, and strict constraints can also be applied. This choice of pdf gives the following relative entropy or negative Kullback–Leibler function [21], over the space of uncertainties used in this formulation:

H = - \int_{l_{1}}^{u_{1}} \dots \int_{l_{n}}^{u_{n}} p (X) \ln \frac{p (X)}{q (X)} d X,

(53)

where

q (X)

is the prior pdf, and

l_{i}

and

u_{i}

are the lower and upper bounds of the

i th

flow rate. The relative entropy is then maximised subject to the constraints on the system. The following constraints are always required:

Normalisation of probability:

$1 = \int_{l_{1}}^{u_{1}} \dots \int_{l_{n}}^{u_{n}} p (X) d X .$

(54)
Kirchhoff’s first law, for the conservation of flow rates at each internal node, here imposed in the mean:

$0 = (C_{X} + C_{\bar{X}}) (\int_{l_{1}}^{u_{1}} \dots \int_{l_{n}}^{u_{n}} X p (X) d X) .$

(55)
Kirchhoff’s second law, which requires the potential difference to vanish around each enclosed loop, again imposed in the mean:

$0 = (W_{X} + W_{\bar{X}}) (\int_{l_{1}}^{u_{1}} \dots \int_{l_{n}}^{u_{n}} X p (X) d X) .$

(56)

We also allow for any of the following constraints:

A set of specified inflow/outflow and internal flow rate constraints assuming the uncertainty is described by a Gaussian distribution:

$〈 F_{p} 〉 = \int_{l_{1}}^{u_{1}} \dots \int_{l_{n}}^{u_{n}} \ln (\frac{\exp (- \frac{1}{2} {((F_{X} + F_{\bar{X}}) X - Y_{F})}^{⊤} Σ_{F}^{- 1} ((F_{X} + F_{\bar{X}}) X) - Y_{F})}{{(2 π)}^{\frac{N_{\hat{Θ}} + N_{\hat{Q}}}{2}} | Σ_{F} |}) p (X) d X .$

(57)
Potential difference constraints between pairs of nodes again assuming the uncertainty is described by a Gaussian distribution:

$〈 T_{p} 〉 = \int_{l_{1}}^{u_{1}} \dots \int_{l_{n}}^{u_{n}} \ln (\frac{\exp (- \frac{1}{2} {((T_{X} + T_{\bar{X}}) X - Y_{T})}^{⊤} Σ_{T}^{- 1} ((T_{X} + T_{\bar{X}}) X) - Y_{T})}{{(2 π)}^{\frac{h_{c}}{2}} | Σ_{T} |}) p (X) d X .$

(58)

After identifying the constraints, the entropy (53) is then maximised subject to Equations (54)–(56) and whichever of Equations (57) and (58) apply. Applying the calculus of variations, we form the Lagrangian:

\begin{matrix} L = & - \int_{l_{1}}^{u_{1}} \dots \int_{l_{n}}^{u_{n}} p (X) \ln \frac{p (X)}{q (X)} d X - \hat{κ} (\int_{l_{1}}^{u_{1}} \dots \int_{l_{n}}^{u_{n}} p (X) d X - 1) \\ - α ((C_{X} + C_{\bar{X}}) (\int_{l_{1}}^{u_{1}} \dots \int_{l_{n}}^{u_{n}} X p (X) d X)) \\ - β ((W_{X} + W_{\bar{X}}) (\int_{l_{1}}^{u_{1}} \dots \int_{l_{n}}^{u_{n}} X p (X) d X)) \\ - λ (\int_{l_{1}}^{u_{1}} \dots \int_{l_{n}}^{u_{n}} \ln (\frac{\exp (- \frac{1}{2} {((F_{X} + F_{\bar{X}}) X - Y_{F})}^{⊤} Σ_{F}^{- 1} ((F_{X} + F_{\bar{X}}) X) - Y_{F})}{{(2 π)}^{\frac{N_{\hat{Θ}} + N_{\hat{Q}}}{2}} | Σ_{F} |}) p (X) d X - 〈 F_{p} 〉) \\ - η (\int_{l_{1}}^{u_{1}} \dots \int_{l_{n}}^{u_{n}} \ln (\frac{\exp (- \frac{1}{2} {((T_{X} + T_{\bar{X}}) X - Y_{T})}^{⊤} Σ_{T}^{- 1} ((T_{X} + T_{\bar{X}}) X) - Y_{T})}{{(2 π)}^{\frac{h_{c}}{2}} | Σ_{T} |}) p (X) d X - 〈 T_{p} 〉), \end{matrix}

(59)

where

\hat{κ}

, (scalar)

α

,

β

,

λ

and

η

(row vectors) are the Lagrange multipliers for the normalisation, Kirchhoff’s first and second laws, flow rates and the head loss constraints, respectively. The variation of

L

is given by

δ L = 0

. Extremizing Equation (59) by taking the functional derivative with respect to

p (X)

and combining integrals gives:

\begin{matrix} δ L = 0 = \int_{l_{1}}^{u_{1}} \dots \int_{l_{n}}^{u_{n}} [- \ln \frac{p (X)}{q (X)} - κ - α (C_{X} + C_{\bar{X}}) X - β (W_{X} + W_{\bar{X}}) X \\ - λ (\ln (\frac{\exp (- \frac{1}{2} {((F_{X} + F_{\bar{X}}) X - Y_{F})}^{⊤} Σ_{F}^{- 1} ((F_{X} + F_{\bar{X}}) X) - Y_{F})}{{(2 π)}^{\frac{N_{\hat{Θ}} + N_{\hat{Q}}}{2}} | Σ_{F} |})) \\ - η (\ln (\frac{\exp (- \frac{1}{2} {((T_{X} + T_{\bar{X}}) X - Y_{T})}^{⊤} Σ_{T}^{- 1} ((T_{X} + T_{\bar{X}}) X) - Y_{T})}{{(2 π)}^{\frac{h_{c}}{2}} | Σ_{T} |}))] d X, \end{matrix}

(60)

where

κ = \hat{κ} + 1 .

Rearrangement gives the following solution for

p (X)

(the Boltzmann distribution):

\begin{matrix} p^{*} (X) = q (X) \exp (- κ - α (C_{X} + C_{\bar{X}}) X - β (W_{X} + W_{\bar{X}}) X \\ - λ (\ln (\frac{\exp (- \frac{1}{2} {((F_{X} + F_{\bar{X}}) X - Y_{F})}^{⊤} Σ_{F}^{- 1} ((F_{X} + F_{\bar{X}}) X) - Y_{F})}{{(2 π)}^{\frac{N_{\hat{Θ}} + N_{\hat{Q}}}{2}} | Σ_{F} |})) \\ - η (\ln (\frac{\exp (- \frac{1}{2} {((T_{X} + T_{\bar{X}}) X - Y_{T})}^{⊤} Σ_{T}^{- 1} ((T_{X} + T_{\bar{X}}) X) - Y_{T})}{{(2 π)}^{\frac{h_{c}}{2}} | Σ_{T} |}))) . \end{matrix}

(61)

This can be solved, in conjunction with the constraints (36)–(40), to give

p^{*} (X)

and the Lagrange multipliers κ,

α

,

β

,

λ

and

η

. As the purpose of the soft constraints is to incorporate a distribution as a constraint, we take

λ = - 1

and

η = - 1

.

4.2. Solution and Comparison to Bayesian Solution

If the prior is chosen to be proportional to Equation (6), the MaxEnt probability distribution with normalisation, Kirchhoff’s first and second law, potential difference and flow rate constraints is proportional to

\begin{matrix} \ln (p^{*} (X)) \propto - \frac{1}{2} (X^{⊤} Σ^{- 1} X - X^{⊤} Σ^{- 1} m - m^{⊤} Σ^{- 1} X) - α (C_{X} + C_{\bar{X}}) X - β (W_{X} + W_{\bar{X}}) X \\ + (- \frac{1}{2} (X^{⊤} {(F_{X} + F_{\bar{X}})}^{⊤} Σ_{F}^{- 1} (F_{X} + F_{\bar{X}}) X - X^{⊤} {(F_{X} + F_{\bar{X}})}^{⊤} Σ_{F}^{- 1} Y_{F} - {Y_{F}}^{⊤} Σ_{F}^{- 1} (F_{X} + F_{\bar{X}}) X)) \\ + (- \frac{1}{2} (X^{⊤} {(T_{X} + T_{\bar{X}})}^{⊤} Σ_{T}^{- 1} (T_{X} + T_{\bar{X}}) X - X^{⊤} {(T_{X} + T_{\bar{X}})}^{⊤} Σ_{T}^{- 1} Y_{T} - {Y_{T}}^{⊤} Σ_{T}^{- 1} (T_{X} + T_{\bar{X}}) X)) . \end{matrix}

(62)

Combining terms of the same order and assuming the covariance is symmetric and positive definite

\begin{matrix} \ln (p^{*} (X)) \propto - \frac{1}{2} X^{⊤} (Σ^{- 1} + {(F_{X} + F_{\bar{X}})}^{⊤} Σ_{F}^{- 1} (F_{X} + F_{\bar{X}}) + {(T_{X} + T_{\bar{X}})}^{⊤} Σ_{T}^{- 1} (T_{X} + T_{\bar{X}})) X \\ + (m^{⊤} Σ^{- 1} - α (C_{X} + C_{\bar{X}}) - β (W_{X} + W_{\bar{X}}) + {Y_{F}}^{⊤} Σ_{F}^{- 1} (F_{X} + F_{\bar{X}}) + {Y_{T}}^{⊤} Σ_{T}^{- 1} (T_{X} + T_{\bar{X}})) X, \end{matrix}

(63)

let

\hat{O} = [\begin{matrix} C_{X} + C_{\bar{X}} \\ W_{X} + W_{\bar{X}} \end{matrix}],

(64)

\tilde{O} = [\begin{matrix} F_{X} + F_{\bar{X}} \\ T_{X} + T_{\bar{X}} \end{matrix}],

(65)

{\tilde{S}}^{- 1} = [\begin{matrix} Σ_{F}^{- 1} & 0 \\ 0 & Σ_{T}^{- 1} \end{matrix}],

(66)

\tilde{y} = [\begin{matrix} Y_{F} \\ Y_{T} \end{matrix}],

(67)

so

\begin{matrix} \ln (p^{*} (X)) \propto - \frac{1}{2} X^{⊤} (Σ^{- 1} + {\tilde{O}}^{⊤} {\tilde{S}}^{- 1} \tilde{O}) X + (m^{⊤} Σ^{- 1} - [\begin{matrix} α & β \end{matrix}] \hat{O} + {\tilde{y}}^{⊤} {\tilde{S}}^{- 1} \tilde{O}) X, \end{matrix}

(68)

and completing the square

\ln (p^{*} (X)) \propto - \frac{1}{2} {(X - 〈 X 〉)}^{⊤} {\tilde{Σ}}_{P}^{- 1} (X - 〈 X 〉),

(69)

where

〈 X 〉 = \int_{- \infty}^{\infty} \dots \int_{- \infty}^{\infty} X p (X) d X = {\tilde{Σ}}_{P} (Σ^{- 1} m - {\hat{O}}^{⊤} [\begin{matrix} α \\ β \end{matrix}] + {\tilde{O}}^{⊤} {\tilde{S}}^{- 1} \tilde{y})

(70)

and

{\tilde{Σ}}_{P} = 〈 X X^{⊤} 〉 - 〈 X 〉 {〈 X 〉}^{⊤} = \int_{- \infty}^{\infty} \dots \int_{- \infty}^{\infty} X X^{⊤} p (X) d X - 〈 X 〉 {〈 X 〉}^{⊤} = {(Σ^{- 1} + {\tilde{O}}^{⊤} {\tilde{S}}^{- 1} \tilde{O})}^{- 1} .

(71)

Using the Woodbury matrix identity [37] gives the posterior covariance matrix

{\tilde{Σ}}_{P} = Σ - Σ {\tilde{O}}^{⊤} {(\tilde{S} + \tilde{O} Σ {\tilde{O}}^{⊤})}^{- 1} \tilde{O} Σ .

(72)

The following algebra follows that of the previous derivations. Right multiplying the inverse posterior covariance by

Σ {\tilde{O}}^{⊤}

gives

{\tilde{Σ}}_{P}^{- 1} Σ {\tilde{O}}^{⊤} = {\tilde{O}}^{⊤} + {\tilde{O}}^{⊤} {\tilde{S}}^{- 1} \tilde{O} Σ {\tilde{O}}^{⊤} = {\tilde{O}}^{⊤} {\tilde{S}}^{- 1} (\tilde{S} + \tilde{O} Σ {\tilde{O}}^{⊤}) .

(73)

Left multiplying with the posterior covariance then gives

Σ {\tilde{O}}^{⊤} = {\tilde{Σ}}_{P} {\tilde{O}}^{⊤} {\tilde{S}}^{- 1} (\tilde{S} + \tilde{O} Σ {\tilde{O}}^{⊤})

(74)

and obtaining

{\tilde{Σ}}_{P} {\tilde{O}}^{⊤} {\tilde{S}}^{- 1}

by right multiplying by

{(\tilde{S} + \tilde{O} Σ {\tilde{O}}^{⊤})}^{- 1}

gives

Σ {\tilde{O}}^{⊤} {(\tilde{S} + \tilde{O} Σ {\tilde{O}}^{⊤})}^{- 1} = {\tilde{Σ}}_{P} {\tilde{O}}^{⊤} {\tilde{S}}^{- 1} .

(75)

The posterior mean flow rates can now be expressed using Equation (70) by substituting Equations (72) and (75) to give

〈 X 〉 = (Σ - Σ {\tilde{O}}^{⊤} {(\tilde{S} + \tilde{O} Σ {\tilde{O}}^{⊤})}^{- 1} \tilde{O} Σ) Σ^{- 1} m + Σ {\tilde{O}}^{⊤} {(\tilde{S} + \tilde{O} Σ {\tilde{O}}^{⊤})}^{- 1} \tilde{y} - {\tilde{Σ}}_{P} {\hat{O}}^{⊤} [\begin{matrix} α \\ β \end{matrix}],

(76)

〈 X 〉 = m + Σ {\tilde{O}}^{⊤} {(\tilde{S} + \tilde{O} Σ {\tilde{O}}^{⊤})}^{- 1} (\tilde{y} - \tilde{O} m) - {\tilde{Σ}}_{P} {\hat{O}}^{⊤} [\begin{matrix} α \\ β \end{matrix}] .

(77)

Using the constraint equations, the Lagrange multipliers can be found from

[\begin{matrix} α \\ β \end{matrix}] = {(\hat{O} {\tilde{Σ}}_{P} {\hat{O}}^{⊤})}^{- 1} \hat{O} (m + Σ {\tilde{O}}^{⊤} {(\tilde{S} + \tilde{O} Σ {\tilde{O}}^{⊤})}^{- 1} (\tilde{y} - \tilde{O} m)) .

(78)

Substituting Equation (78) into Equation (77) gives the posterior means

〈 X 〉 = m + Σ {\tilde{O}}^{⊤} {(\tilde{S} + \tilde{O} Σ {\tilde{O}}^{⊤})}^{- 1} (\tilde{y} - \tilde{O} m) - {\tilde{Σ}}_{P} {\hat{O}}^{⊤} {(\hat{O} {\tilde{Σ}}_{P} {\hat{O}}^{⊤})}^{- 1} \hat{O} (m + Σ {\tilde{O}}^{⊤} {(\tilde{S} + \tilde{O} Σ {\tilde{O}}^{⊤})}^{- 1} (\tilde{y} - \tilde{O} m))

(79)

or

〈 X 〉 = (m + Σ {\tilde{O}}^{⊤} {(\tilde{S} + \tilde{O} Σ {\tilde{O}}^{⊤})}^{- 1} (\tilde{y} - \tilde{O} m)) (I - {\tilde{Σ}}_{P} {\hat{O}}^{⊤} {(\hat{O} {\tilde{Σ}}_{P} {\hat{O}}^{⊤})}^{- 1} \hat{O}) .

(80)

As evident, the first bracketed term is of similar structure to Equations (33) and (51), although it contains parameters related to the soft probabilistic constraints. The second term gives a second-order expansion of that solution relating to the interaction between the hard moment constraints and soft probabilistic constraints. If all constraints were applied as soft probabilistic constraints, Equation (33) would be obtained. Numerical experiments suggest that the means obtained in Equation (80) are equal to the Bayesian posterior means Equation (33), in several examples considered, but the covariances are different.

5. Discussion

The MaxEnt and Bayesian methods rest on different theoretical foundations but are both able to predict flows on networks by updating the prior belief to the posterior with the inclusion of new information in the form of constraints or uncertain data. This study compares the application of Bayesian inference and the MaxEnt method for the analysis of flow networks, for the latter using soft constraints—included in the prior or imposed as probabilistic constraints—in addition to standard moment constraints. It is shown that both the Bayesian method and MaxEnt method with soft prior constraints, implemented using a multidimensional Gaussian prior pdf, infer the same mean flow rates but different covariance matrices. In the Bayesian method, the interactions between variables are applied through the likelihood function, using second or higher-order cross-terms within the posterior pdf. In contrast, the MaxEnt method incorporates interactions between variables using Lagrange multipliers, avoiding second-order correlation terms in the posterior covariance. The MaxEnt method with soft prior constraints therefore has a numerical advantage in its integrations, in that the covariance terms are avoided.

In contrast, the second MaxEnt method with probabilistic and moment constraints is shown to give a posterior mean of similar, but not identical, structure to the other two methods. Due to the mixture of constraint types, some of the interactions between variables are incorporated in the Lagrange multipliers and some are incorporated in the covariance matrix, leading to a more complicated formulation.

For both MaxEnt formulations given herein, the equivalence between the posterior means inferred by the Bayesian and MaxEnt methods is dependent on the choice of a multidimensional Gaussian prior and its parameterisation. Further research is required to classify the effect of other prior distributions on the MaxEnt and Bayesian formulations, and whether these lead to equivalences between the means or higher-order moments of the inferred posterior pdf.

Acknowledgments

This project acknowledges funding support from the Australian Research Council Discovery Projects Grant DP140104402, Go8/DAAD Australia-Germany Joint Research Cooperation Scheme RG123832 and the French Agence Nationale de la Recherche Chair of Excellence (TUCOROM) and the Institute Prime, Poitiers, France.

Author Contributions

Steven H. Waldrip conducted the analysis and prepared this manuscript for this work, under the guidence and supervision of Robert K. Niven. Both authors have read and approved the final manuscript.

Conflicts of Interest

The authors declare no conflict of interest. The founding sponsors had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, and in the decision to publish the results.

References

Sivia, D.; Skilling, J. Data Analysis: A Bayesian Tutorial, 2nd ed.; Oxford University Press: Oxford, UK, 2006. [Google Scholar]
Rougier, J.; Goldstein, M. A Bayesian analysis of fluid flow in pipe-lines. J. R. Stat. Soc. 2001, 50, 77–93. [Google Scholar] [CrossRef]
Savic, D.A.; Kapelan, Z.S.; Jonkergouw, P.M. Quo vadis water distribution model calibration? Urban Water J. 2009, 6, 3–22. [Google Scholar] [CrossRef]
Hutton, C.J.; Kapelan, Z.; Vamvakeridou-Lyroudia, L.; Savic, D. Real-time demand estimation in water distrubtion systems under uncertainty. In Proceedings of the 14th Water Distribution Systems Analysis Conference ( WDSA 2012), Adelaide, Australia, 24–27 September 2012; pp. 1374–1385.
Hutton, C.; Kapelan, Z. Real-time Burst Detection in Water Distribution Systems Using a Bayesian Demand Forecasting Methodology. Procedia Eng. 2015, 119, 13–18. [Google Scholar] [CrossRef] [Green Version]
Jaynes, E.T. Information Theory and Statistical Mechanics. Phys. Rev. 1957, 106, 620–630. [Google Scholar] [CrossRef]
Jaynes, E.T. Information Theory and Statistical Mechanics. II. Phys. Rev. 1957, 108, 171–190. [Google Scholar] [CrossRef]
Kapur, J.N.; Kesavan, H.K. Entropy Optimization Principles with Applications; Academic Press: Boston, MA, USA, 1992. [Google Scholar]
Jaynes, E.T. Probability Theory: The Logic of Science; Cambridge University Press: Cambridge, UK, 2003. [Google Scholar]
Pressé, S.; Ghosh, K.; Lee, J.; Dill, K.A. Principles of maximum entropy and maximum caliber in statistical physics. Rev. Mod. Phys. 2013, 85, 1115–1141. [Google Scholar] [CrossRef]
Shannon, C.E. A Mathematical Theory of Communication. Bell Syst. Tech. J. 1948, 27, 623–656. [Google Scholar] [CrossRef]
Shore, J.; Johnson, R. Axiomatic derivation of the principle of maximum entropy and the principle of minimum cross-entropy. IEEE Trans. Inf. Theory 1980, 26, 26–37. [Google Scholar] [CrossRef]
Caticha, A. Relative entropy and inductive inference. In Proceedings of the 23rd International Workshop on Bayesian Inference and Maximum Entropy Methods in Science and Engineering, Jackson Hole, Wyoming, 3–8 August 2003.
Boltzmann, L. Über die Beziehung zwischen dem zweiten Hauptsatz der mechanischen Wärmetheorie und der Wahrscheinlichkeitsrechnung respektive den Sätzen über das Wärmegleichgewicht (On the Relationship between the Second Fundamental Theorem of the Mechanical Theory of H. Wiener Berichte 1877, 2, 373–435. (In German) [Google Scholar]
Sharp, K.; Matschinsky, F. Translation of Ludwig Boltzmann’s Paper “On the Relationship between the Second Fundamental Theorem of the Mechanical Theory of Heat and Probability Calculations Regarding the Conditions for Thermal Equilibrium” Sitzungberichte der Kaiserlichen Akademie der Wissenschaften. Entropy 2015, 17, 1971–2009. [Google Scholar]
Jaynes, E.T. Prior Probabilities. IEEE Trans. Syst. Sci. Cybern. 1968, 4, 227–241. [Google Scholar] [CrossRef]
Niven, R.K. Combinatorial entropies and statistics. Eur. Phys. J. B 2009, 70, 49–63. [Google Scholar] [CrossRef]
Planck, M. Ueber das Gesetz der Energieverteilung im Normalspectrum. Annalen der Physik 1901, 309, 553–563. (In German) [Google Scholar] [CrossRef]
Tribus, M. Thermostatics and Thermodynamics: An Introduction to Energy, Information and States of Matter, with Engineering Applications; Van Nostrand: New York, NY, USA, 1961. [Google Scholar]
Ellis, R.S. Entropy, Large Deviations, and Statistical Mechanics; Springer: Berlin/Heidelberg, Germany, 1985. [Google Scholar]
Kullback, S.; Leibler, R.A. On Information and Sufficiency. Ann. Math. Stat. 1951, 22, 79–86. [Google Scholar] [CrossRef]
Waldrip, S.H.; Niven, R.K.; Abel, M.; Schlegel, M. Maximum entropy analysis of hydraulic pipe networks. In Proceedings of the 33rd International Workshop on Bayesian Inference and Maximum Entropy Methods in Science and Engineering (MaxEnt 2013), Canberra, Australia, 15–20 December 2013; pp. 180–186.
Waldrip, S.H.; Niven, R.K.; Abel, M.; Schlegel, M.; Noack, B.R. MaxEnt analysis of a water distribution network in Canberra, ACT, Australia. In Proceedings of the 2014 Bayesian Inference and Maximum Entropy Methods In Science and Engineering (MaxEnt 2014), Amboise, France, 21–26 September 2014; pp. 479–486.
Waldrip, S.H.; Niven, R.K.; Abel, M.; Schlegel, M. Maximum Entropy Analysis of Hydraulic Pipe Flow Networks. J. Hydraul. Eng. 2016, 142, 04016028. [Google Scholar] [CrossRef]
Waldrip, S.; Niven, R.K.; Abel, M.; Schlegel, M. Reduced-Parameter Method for Maximum Entropy Analysis of Hydraulic Pipe Flow Networks. J. Hydraul. Eng. 2016. submitted. [Google Scholar]
Waldrip, S.H. The Probabilistic Analysis of Flow Networks. PhD Thesis, The University of New South Wales, Canberra, Australia, 2017. [Google Scholar]
Niven, R.K.; Abel, M.; Schlegel, M.; Waldrip, S.H. Maximum entropy analysis of flow networks. In Proceedings of the 33rd International Workshop on Bayesian Inference and Maximum Entropy Methods in Science and Engineering (MaxEnt 2013), Canberra, Australia, 15–20 December 2013; pp. 159–164.
Niven, R.K.; Abel, M.; Schlegel, M.; Waldrip, S.H. Maximum entropy analysis of flow and reaction networks. In Proceedings of the 2014 Bayesian Inference and Maximum Entropy Methods In Science and Engineering (MaxEnt 2014), Amboise, France, 21–26 September 2014; pp. 271–278.
Niven, R.; Waldrip, S.; Abel, M.; Schlegel, M.; Noack, B. Maximum Entropy Analysis of Flow Networks with Nonlinear Constraints. In Proceedings of the 2nd International Electronic Conference on Entropy and Its Applications, Basel, Switzerland, 15–30 November 2015; p. A012.
Williams, P.M. Bayesian Conditionalisation and the Principle of Minimum Information. Br. J. Philos. Sci. 1980, 31, 131–144. [Google Scholar] [CrossRef]
Caticha, A.; Giffin, A. Updating Probabilities. In Proceedings of the 26th International Workshop on Bayesian Inference and Maximum Entropy Methods, Paris, France, 8–13 July 2006; pp. 31–42.
Giffin, A.; Caticha, A. Updating probabilities with data and moments. In Proceedings of the 27th International Workshop on Bayesian Inference and Maximum Entropy Methods in Science and Engineering (MaxEnt 2007), Saratoga Springs, NY, USA, 8–13 July 2007; pp. 74–84.
Giffin, A. Maximum Entropy: The Universal Method for Inference. PhD Thesis, University at Albany, State University of New York, Albany, NY, USA, 2008. [Google Scholar]
Hennig, P.; Kiefel, M. Quasi-Newton Methods: A New Direction. In Proceedings of the 29th International Conference on Machine Learning (ICML-12), Edinburgh, Scotland, 26 June–1 July 2012; pp. 25–32.
Hennig, P.; Kiefel, M. Quasi-Newton Methods: A New Direction. J. Mach. Learn. Res. 2013, 14, 843–865. [Google Scholar]
Waldrip, S.H.; Niven, R.K. Maximum Entropy Derivation of Quasi-Newton Methods. SIAM J. Optim. 2016, 26, 2495–2511. [Google Scholar] [CrossRef]
Woodbury, M.A. Inverting Modified Matrices; Memorandum Report 42; Princeton University: Princeton, NJ, USA, 1950. [Google Scholar]

© 2017 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license ( http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Waldrip, S.H.; Niven, R.K. Comparison Between Bayesian and Maximum Entropy Analyses of Flow Networks†. Entropy 2017, 19, 58. https://doi.org/10.3390/e19020058

AMA Style

Waldrip SH, Niven RK. Comparison Between Bayesian and Maximum Entropy Analyses of Flow Networks†. Entropy. 2017; 19(2):58. https://doi.org/10.3390/e19020058

Chicago/Turabian Style

Waldrip, Steven H., and Robert K. Niven. 2017. "Comparison Between Bayesian and Maximum Entropy Analyses of Flow Networks†" Entropy 19, no. 2: 58. https://doi.org/10.3390/e19020058

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Comparison Between Bayesian and Maximum Entropy Analyses of Flow Networks†

Abstract

1. Introduction

2. Bayesian Analysis

3. MaxEnt Analysis with Soft Constraints Implemented in the Prior

3.1. Formulation

3.2. Solution and Comparison to Bayesian Solution

4. MaxEnt Analysis with Soft Probabilistic Constraints

4.1. Formulation

4.2. Solution and Comparison to Bayesian Solution

5. Discussion

Acknowledgments

Author Contributions

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI