**A Hybrid Inversion Scheme Combining Markov Chain Monte Carlo and Iterative Methods for Determining Optical Properties of Random Media**

#### **Yu Jiang 1, Yoko Hoshi 2, Manabu Machida 2,***<sup>∗</sup>* **and Gen Nakamura <sup>3</sup>**


Received: 31 July 2019; Accepted: 21 August 2019; Published: 24 August 2019

**Abstract:** Near-infrared spectroscopy (NIRS) including diffuse optical tomography is an imaging modality which makes use of diffuse light propagation in random media. When optical properties of a random medium are investigated from boundary measurements of reflected or transmitted light, iterative inversion schemes such as the Levenberg–Marquardt algorithm are known to fail when initial guesses are not close enough to the true value of the coefficient to be reconstructed. In this paper, we investigate how this weakness of iterative schemes is overcome using Markov chain Monte Carlo. Using time-resolved measurements performed against a polyurethane-based phantom, we present a case that the Levenberg–Marquardt algorithm fails to work but the proposed hybrid method works well. Then, with a toy model of diffuse optical tomography we illustrate that the Levenberg–Marquardt method fails when it is trapped by a local minimum but the hybrid method can escape from local minima by using the Metropolis–Hastings Markov chain Monte Carlo algorithm until it reaches the valley of the global minimum. The proposed hybrid scheme can be applied to different inverse problems in NIRS which are solved iteratively. We find that for both numerical and phantom experiments, optical properties such as the absorption and reduced scattering coefficients can be retrieved without being trapped by a local minimum when Monte Carlo simulation is run only about 100 steps before switching to an iterative method. The hybrid method is compared with simulated annealing. Although the Metropolis–Hastings MCMC arrives at the steady state at about 10,000 Monte Carlo steps, in the hybrid method the Monte Carlo simulation can be stopped way before the burn-in time.

**Keywords:** near-infrared spectroscopy; diffuse light; inverse problems; optical tomography

#### **1. Introduction**

In near-infrared spectroscopy (NIRS), we estimate optical properties of biological tissue by solving inverse diffuse problems [1,2]. Such inverse problems are commonly solved by means of iterative methods. In the case of a homogeneous medium, absorption and reduced scattering coefficients of the medium can be obtained with iterative methods such as the Levenberg–Marquardt algorithm [3,4] from time-resolved measurements (for example, see the review article [5]). Neuroimaging for the human brain by NIRS through the neurovascular coupling has been developed and is called functional NIRS (fNIRS) [6–9]. Since the region of interest on the head can be small, it is possible to assume a simple geometry of the half space. Quantitative measurements of inter-regional differences in neuronal activity requires accurate estimates of optical properties.

In Ref. [10], the Nelder–Mead simplex method was used to retrieve optical parameters in layered tissue. Heterogeneity of optical properties can be obtained by diffuse optical tomography [1,11,12]. Iterative numerical schemes are used to minimize the cost function when solving these inverse problems. A gradient-based approach [13] was used to detect breast cancer [14]. The brain activity of a newborn infant was investigated [15] with diffuse optical tomography by TOAST (temporal optical absorption and scattering tomography) [16,17], in which iterative algorithms such as the nonlinear conjugate gradients, damped Gauss–Newton method, and Levenberg–Marquardt method are implemented. Diffuse optical tomography was performed on human lower legs and a forearm with the algebraic reconstruction algorithm in the framework of the modified generalized pulse spectrum technique [18]. See Refs. [19,20] for numerical techniques for diffuse optical tomography. For these iterative numerical schemes to work, it is important to choose a good initial guess for the initial value of the iteration.

Solving inverse problems by the Bayesian approach has been sought as an alternative way. In Ref. [21], a novel use of the Bayesian approach was considered to take modeling error into account. The Bayesian inversion with the Metropolis–Hastings Markov chain Monte Carlo was used in Refs. [22,23]. The Bayesian approach was used to determine optical parameters of the human head [24]. Although the Markov chain Monte Carlo (MCMC) approach is in principle able to escape from local minima, it is computationally time consuming. Hence, despite the above-mentioned efforts, the use of Monte Carlo for inverse problems in NIRS has been extremely limited.

In this paper, we shed light on Markov chain Monte Carlo once again by combining it with an iterative scheme, and investigate the use of it in NIRS. In particular, we test a hybrid numerical scheme of Markov chain Monte Carlo and an iterative method. In the proposed hybrid scheme, the Markov chain Monte Carlo algorithm is first used to provide an initial guess using jumps in the landscape of the cost function, and then an iterative method is used after the initial large fluctuation. Thus, the proposed method realizes a fast reconstruction while, in the beginning, obtained values at Monte Carlo steps jump around in the landscape of the cost function. The MCMC simulation is necessary only for the first 100 steps. Then the hybrid method starts to use an iterative scheme and reconstructs optical properties by searching the global minimum. The computation time of the iterative scheme is negligible compared with that of the Monte Carlo simulation. Since the Monte Carlo simulation can be stopped way before the burn-in time for the hybrid method, the proposed scheme is at least ten times faster than simulated annealing, which is implemented by the naive use of the Metropolis–Hastings Markov chain Monte Carlo.

The rest of the paper is organized as follows. We develop diffusion theory in Section 2. A polyurethane-based phantom and numerical phantom are also described in Section 2. Section 3 is devoted to experimental and numerical results. Finally, discussion is given in Section 4.

#### **2. Materials and Methods**

#### *2.1. Diffusion Theory*

#### 2.1.1. Diffuse Light in Three Dimensions

Let us suppose that a random medium occupies the three-dimensional half space. We assume that the random medium is characterized by the diffusion coefficient *D* and absorption coefficient *μa*. We have *D* = 1/(3*μ <sup>s</sup>*), where *μ <sup>s</sup>* is the reduced scattering coefficient. Position in the half space (−∞ < *x* < ∞, −∞ < *y* < ∞, 0 < *z* < ∞) is denoted by **r** = (*ρ*, *z*), *ρ* = (*x*, *y*). Let *t* denote time. Let *c* be the speed of light in the medium. We assume an incident beam on the boundary at the origin in the *x*-*y* plane. The energy density *u* of light in the medium obeys the following diffusion equation.

$$
\left(\frac{1}{c}\frac{\partial}{\partial t} - D\Delta + \mu\_d\right)u(\mathbf{r}, t) = 0,\tag{1}
$$

with the boundary condition

$$-\ell \frac{\partial}{\partial z} \mu(\mathbf{r}, t) + \mu(\mathbf{r}, t) = \delta(\mathbf{x}) \delta(y) q(t), \tag{2}$$

and the initial condition *u*(**r**, 0) = 0. The right-hand side of the boundary condition is the incident laser beam which illuminates the medium at the origin (0, 0, 0) with the temporal profile *q*(*t*). In the phantom experiment described below, the incident light enters the phantom in the positive *z* direction. The extrapolation distance is a nonnegative constant. If we consider the diffuse surface reflection, we have [25]

$$\ell = 2D \frac{1 + r\_d}{1 - r\_d} \,\prime \tag{3}$$

where the internal reflection *rd* is given by [26] *rd* = −1.4399*n*−<sup>2</sup> + 0.7099*n*−<sup>1</sup> + 0.6681 + 0.0636*n*. Let us consider the corresponding surface Green's function *Gs*(**r**, *t*; *ρ* ,*s*), which satisfies Equation (1) and the boundary condition

$$\left(-\ell\frac{\partial}{\partial z} + 1\right) G\_s(\mathbf{r}, t; \rho', \mathbf{s}) = \delta(\mathbf{x} - \mathbf{x}')\delta(\mathbf{y} - \mathbf{y}')\delta(t - \mathbf{s}),\tag{4}$$

together with the initial condition *Gs*(**r**, 0; *ρ* ,*s*) = 0. We obtain [27–30]

$$\begin{split} \mathrm{G}\_{5}(\mathbf{r},t;\boldsymbol{\rho}',s) &= \frac{cD}{\ell} \left[ \frac{2e^{-\mu\_{\mathrm{c}}c(t-s)}}{\left(4\pi Dc(t-s)\right)^{3/2}} e^{-\frac{(x-x')^{2}+(y-y')^{2}}{4Dc(t-s)}} e^{-\frac{z^{2}}{4Dc(t-s)}} \\ &- \frac{e^{-\mu\_{\mathrm{c}}c(t-s)}}{4\pi\ell Dc(t-s)} e^{-\frac{(x-x')^{2}+(y-y')^{2}}{4Dc(t-s)}} e^{\frac{\mathrm{Dc}c(t-s)}{\ell}} e^{\frac{\mathrm{Dc}c(t-s)}{\ell^{2}}} \mathrm{erfc}\left(\frac{z+2Dc(t-s)/\ell}{\sqrt{4Dc(t-s)}}\right) \right], \quad t > s\_{\prime} \end{split} \tag{5}$$

with *Gs*(**r**, *t*; *ρ* ,*s*) = 0 for *t* ≤ *s*. The complementary error function is defined by erfc(*x*) = (2/√*π*) <sup>∞</sup> *<sup>x</sup>* exp(−*t* <sup>2</sup>) *dt*. Let **<sup>r</sup>***<sup>d</sup>* be a point in the *<sup>x</sup>*-*<sup>y</sup>* plane with |**r***d*| > 0. We obtain

$$\begin{split} u(\mathbf{r}\_d, t) &= \int\_0^t \mathbb{G}\_s(\mathbf{r}\_d, t; \mathbf{0}, s) q(\mathbf{s}) \, d\mathbf{s} \\ &= \int\_0^t e^{-\mu\_a c(t-s)} e^{-\frac{|\mathbf{r}\_d|^2}{4\mathcal{L}c(t-s)}} \frac{2q(\mathbf{s})}{\left(4\pi Dc(t-s)\right)^{3/2}} \left[1 - \frac{\sqrt{4\pi Dc(t-s)}}{2\ell} e^{\frac{\mathrm{Dc}(t-s)}{\ell^2}} \operatorname{erfc}\left(\frac{\sqrt{\mathrm{Dc}(t-s)}}{\ell}\right)\right] \, d\mathbf{s}. \end{split} \tag{6}$$

This *u*(**r***d*, *t*) in Equation (6) is used for the calculation in Section 3.1.

#### 2.1.2. Diffuse Light in Two Dimensions

For the later purpose of a numerical experiment, we here consider light propagation in two dimensions. Let us suppose that the two-dimensional half space of positive *y* is occupied with a random medium in the *x*-*y* plane. The energy density *u* of light at position *ρ* = (*x*, *y*) in the medium due to the incident beam on the boundary (the *x*-axis) at *ρ<sup>i</sup> <sup>s</sup>* (*i* = 1, ... , *Ms*) obeys the following diffusion equation.

$$\left(\frac{1}{c}\frac{\partial}{\partial t} - D\Delta + \mu\_a(\rho)\right)u(\rho, t; \rho\_s^i) = 0,\tag{7}$$

with the boundary condition

$$-\ell \frac{\partial}{\partial y} u(\rho, t; \rho\_s^{\bar{i}}) + u(\rho, t; \rho\_s^{\bar{i}}) = \delta(\mathbf{x} - \mathbf{x}\_s^{\bar{i}}) \delta(t), \tag{8}$$

and the initial condition *u*(*ρ*, 0; *ρ<sup>i</sup> <sup>s</sup>*) = 0. The incident laser beam on the right-hand side of the boundary condition was assumed to be a pulse at the position *ρ<sup>i</sup> <sup>s</sup>* = (*x<sup>i</sup> <sup>s</sup>*, 0). We suppose that the diffusion coefficient *D* is a positive constant but *μ<sup>a</sup>* varies in space.

Below we develop the Rytov approximation. Let us write *μa*(*ρ*) ≥ 0 as

$$
\mu\_{\mathfrak{a}}(\mathfrak{p}) = \mu\_{\mathfrak{a}0} + \delta \mu\_{\mathfrak{a}}(\mathfrak{p}),
\tag{9}
$$

where *μa*<sup>0</sup> is a constant and the perturbation *δμa*(*ρ*) spatially varies. We note the relation,

$$\mathfrak{u}(\mathfrak{p},t;\mathfrak{p}\_s^i) = \mathfrak{u}\_0(\mathfrak{p},t;\mathfrak{p}\_s^i) - \int\_0^t \int\_0^\infty \int\_{-\infty}^\infty \mathbb{G}(\mathfrak{p},t;\mathfrak{p}',\mathfrak{s}) \delta\mathfrak{u}\_\mathfrak{s}(\mathfrak{p}') \mathfrak{u}(\mathfrak{p}',\mathfrak{s};\mathfrak{p}\_s^i) \,d\mathfrak{x}' \,d\mathfrak{y}' \,\mathrm{d}\mathfrak{s},\tag{10}$$

where *u*0(*ρ*, *t*; *ρ<sup>i</sup> <sup>s</sup>*) is the solution to the diffusion Equation (7) with the zeroth-order coefficient replaced by *b*0. We introduce the Green's function *G*(*ρ*, *t*; *ρ* ,*s*), which satisfies

$$\left(\frac{1}{c}\frac{\partial}{\partial t} - D\Delta + \mu\_{d0}\right)G = \delta(\rho - \rho')\delta(t - s),\tag{11}$$

with the boundary condition −- *∂G <sup>∂</sup><sup>y</sup>* + *G* = 0 at *y* = 0, and the initial condition *G* = 0 at *t* = 0. The above relation (10) can be directly verified. By recursive substitution, we obtain *u* as

$$u(\boldsymbol{\rho},t;\boldsymbol{\rho}\_s^i) = u\_0(\boldsymbol{\rho},t;\boldsymbol{\rho}\_s^i) - \int\_0^t \int\_0^\infty \int\_{-\infty}^\infty \mathbb{G}(\boldsymbol{\rho},t;\boldsymbol{\rho}',\boldsymbol{s})\delta\mu\_a(\boldsymbol{\rho}')u\_0(\boldsymbol{\rho}',\boldsymbol{s};\boldsymbol{\rho}\_s^i)\,d\boldsymbol{x}'d\boldsymbol{y}'ds + O((\delta\mu\_4)^2). \tag{12}$$

By neglecting higher-order terms assuming that *δμ<sup>a</sup>* is small, we arrive at the (first) Born approximation [31], in which *u* is given by

$$u(\boldsymbol{\rho},t;\boldsymbol{\rho}\_s^i) = u\_0(\boldsymbol{\rho},t;\boldsymbol{\rho}\_s^i) - \mathop{\rm c.e.}\limits\_{0}^{t} \int\_0^{\infty} \int\_{-\infty}^{\infty} \mathbb{G}(\boldsymbol{\rho},t;\boldsymbol{\rho}',\boldsymbol{s}) \delta \mu\_a(\boldsymbol{\rho}') u\_0(\boldsymbol{\rho}',\boldsymbol{s};\boldsymbol{\rho}\_s^i) \,d\mathbf{x}' d\mathbf{y}' d\mathbf{s}.\tag{13}$$

We note that the Green's function is obtained as [27–30]

$$\begin{split} G(\boldsymbol{\varrho},t;\boldsymbol{\rho}',\boldsymbol{s}) &= \frac{e^{-\mu\_{d0}c(t-s)}}{4\pi D(t-s)} e^{-\frac{(\boldsymbol{x}-\boldsymbol{x}')^2}{4Dc(t-s)}} \left[ e^{-\frac{(\boldsymbol{y}-\boldsymbol{y}')^2}{4Dc(t-s)}} + e^{-\frac{(\boldsymbol{y}+\boldsymbol{y}')^2}{4Dc(t-s)}} \right. \\ &\quad - \frac{\sqrt{4\pi Dc(t-s)}}{\ell} e^{-\frac{(\boldsymbol{y}+\boldsymbol{y}')^2}{4Dc(t-s)}} e^{\left(\frac{\boldsymbol{y}+\boldsymbol{y}'+2Dc(t-s)/\ell}{\sqrt{4Dc(t-s)}}\right)^2} \operatorname{erfc}\left(\frac{\boldsymbol{y}+\boldsymbol{y}'+2Dc(t-s)/\ell}{\sqrt{4Dc(t-s)}}\right) \Big], \end{split} \tag{14}$$

for *t* > *s*, and otherwise *G*(*ρ*, *t*; *ρ* ,*s*) = 0. Moreover, we obtain

$$u\_0(\rho, t; \rho\_s^i) = \frac{e^{-\mu\_{s0}ct}}{2\pi Dt} e^{-\frac{(x - x\_j^i)^2 + y^2}{4Dct}} \left[ 1 - \frac{\sqrt{\pi Dct}}{\ell} e^{\left(\frac{y + 2Dct/\ell}{\sqrt{4Dct}}\right)^2} \text{erfc}\left(\frac{y + 2Dct/\ell}{\sqrt{4Dct}}\right) \right]. \tag{15}$$

The above expression of *u*<sup>0</sup> is similar to the formula in Equation (6) but does not have a time integral because in this case we assumed the delta function *δ*(*t*) for the incident beam.

To obtain the expression of the Rytov approximation, we introduce *ψ*0, *ψ*<sup>1</sup> as [31]

$$
u\_0 = \varepsilon^{\psi\_0}, \qquad \mu = \varepsilon^{\psi\_0 + \psi\_1}. \tag{16}$$

*Appl. Sci.* **2019**, *9*, 3500

By plugging the above expressions of *u*, *u*<sup>0</sup> into the Born approximation and neglecting terms of *O*((*δμa*)2), we obtain

$$\begin{split} \epsilon^{\Psi\_{1}} &= 1 - \epsilon^{-\Psi\_{0}} \int\_{0}^{t} \int\_{0}^{\infty} \int\_{-\infty}^{\infty} \mathcal{G}(\rho\_{\prime}t; \rho^{\prime}, s) \delta \mu\_{a}(\rho^{\prime}) u\_{0}(\rho^{\prime}, s; \rho\_{s}^{i}) \, d\mathbf{x}^{\prime} d\mathbf{y}^{\prime} ds \\ &= \exp\left[ -\epsilon^{-\Psi\_{0}} \int\_{0}^{t} \int\_{0}^{\infty} \int\_{-\infty}^{\infty} \mathcal{G}(\rho, t; \rho^{\prime}, s) \delta \mu\_{a}(\rho^{\prime}) u\_{0}(\rho^{\prime}, s; \rho\_{s}^{i}) \, d\mathbf{x}^{\prime} d\mathbf{y}^{\prime} ds \right]. \end{split} \tag{17}$$

The (first) Rytov approximation is thus derived as

$$u(\boldsymbol{\rho},t;\boldsymbol{\rho}\_s^i) = u\_0(\boldsymbol{\rho},t;\boldsymbol{\rho}\_s^i)\exp\left[-\frac{1}{u\_0(\boldsymbol{\rho},t;\boldsymbol{\rho}\_s^i)}\int\_0^t \int\_0^\infty \int\_{-\infty}^\infty \mathbb{G}(\boldsymbol{\rho},t;\boldsymbol{\rho}',s)\delta\mu\_4(\boldsymbol{\rho}')u\_0(\boldsymbol{\rho}',s;\boldsymbol{\rho}\_s^i)\,d\boldsymbol{x}'d\boldsymbol{y}'ds\right]. \tag{18}$$

Let us define

$$\begin{split} g(y, t; y', s) &= \frac{1}{4\pi D(t - s)} e^{-\frac{(y + y')^2}{4Dc(t - s)}} \left[ 1 + e^{\frac{(y + y')^2 - (y - y')^2}{4Dc(t - s)}} - \frac{\sqrt{4\pi Dc(t - s)}}{\ell} \right. \\ &\times e^{\left(\frac{y + y'}{2\sqrt{Dc(t - s)}} + \frac{\sqrt{Dc(t - s)}}{\ell}\right)^2} \operatorname{erfc}\left(\frac{y + y'}{2\sqrt{Dc(t - s)}} + \frac{\sqrt{Dc(t - s)}}{\ell}\right) \Big]. \end{split} \tag{19}$$

Then we have

$$\int\_{0}^{t} \mathcal{G}(\mathfrak{p}, t; \mathfrak{p}', s) u\_{0}(\mathfrak{p}', s) \, ds = e^{-\mu\_{\mathrm{d}0}ct} \int\_{0}^{t} e^{-\frac{(x-x')^{2}}{4\mathrm{d}\varepsilon(t-s)}} e^{-\frac{(x'-x'\_{1})^{2}}{4\mathrm{d}\varepsilon s}} \, g(y, t; y', s) g(y', s; 0, 0) \, ds. \tag{20}$$

Therefore, Equation (18) can be rewritten as

$$\begin{split} u(\boldsymbol{\rho},t;\boldsymbol{\rho}\_{s}^{i}) &= u\_{0}(\boldsymbol{\rho},t;\boldsymbol{\rho}\_{s}^{i})\exp\left[-\frac{e^{-\mu\_{s0}t}}{u\_{0}(\boldsymbol{\rho},t;\boldsymbol{\rho}\_{s}^{i})}\int\_{0}^{\infty}\int\_{-\infty}^{\infty}\delta\mu\_{s}(\boldsymbol{\rho}') \\ &\times \left(\int\_{0}^{t}e^{-\frac{(\boldsymbol{x}-\boldsymbol{x}')^{2}}{4\text{D}\boldsymbol{x}\cdot\left(l-s\right)}}e^{-\frac{(\boldsymbol{x}'-\boldsymbol{x}'\_{s})^{2}}{4\text{D}\boldsymbol{x}\cdot\boldsymbol{s}}}g(\boldsymbol{y},t;\boldsymbol{y}',s)g(\boldsymbol{y}',s;0,0)\,ds\right) \,d\boldsymbol{x}'d\boldsymbol{y}'\right]. \end{split} \tag{21}$$

This expression (21) is used to compute the forward data in Section 3.2.

#### *2.2. Inverse Problems by an Iterative Scheme*

We suppose that on the surface of a two- or three-dimensional random medium, for each source at *ρ<sup>i</sup> <sup>s</sup>* or **r***<sup>i</sup> <sup>s</sup>* (*<sup>i</sup>* <sup>=</sup> 1, ... , *Ms*) exiting light is detected at *<sup>ρ</sup><sup>j</sup> <sup>d</sup>* or **r** *j <sup>d</sup>* (*j* = 1, ... , *Md*) and is measured at times *t <sup>k</sup>* (*k* = 1, ... , *Mt*). Let *yijk* be measured data whereas *u* is a solution to the diffusion equation. Let us suppose that *u* depends on a vector **a** which contains unknown parameters. We wish to reconstruct **a**. In Section 3.1, **a** = (*μa*, *D*), and *a* is a scalar (a one-dimensional vector) in Section 3.2. For example, in the former case the solution *u* depends on **a** since the calculated value of *u* depends on *μa*, *D*.

Let us introduce vectors **U** and **F**(**a**) of dimension *MsMdMt* as

$$\{\mathbf{U}\}\_{i\bar{i}\bar{k}} = y\_{i\bar{i}k} \qquad \{\mathbf{F}(\mathbf{a})\}\_{i\bar{j}k} = u(\rho\_{d'}^{\bar{j}}, t^k; \rho\_s^i; \mathbf{a}) \text{ or } u(\mathbf{r}\_{d'}^{\bar{j}}, t^k; \mathbf{r}\_s^i; \mathbf{a}), \tag{22}$$

where we wrote *u*(*ρ<sup>j</sup> d*, *t <sup>k</sup>*; *ρ<sup>i</sup> <sup>s</sup>*) = *<sup>u</sup>*(*ρ<sup>j</sup> d*, *t <sup>k</sup>*; *ρ<sup>i</sup> <sup>s</sup>*; **a**) and *u*(**r** *j d*, *t <sup>k</sup>*;**r***<sup>i</sup> <sup>s</sup>*) = *u*(**r** *j d*, *t <sup>k</sup>*;**r***<sup>i</sup> <sup>s</sup>*; **a**) emphasizing that *u* depends on **<sup>a</sup>**. We find optimal **<sup>a</sup>** by minimizing **<sup>U</sup>** − **<sup>F</sup>**(**a**) <sup>2</sup> <sup>2</sup>. Here we particularly consider the Levenberg–Marquardt method [3,4], i.e., the reconstructed value **<sup>a</sup>**<sup>∗</sup> = lim*k*→<sup>∞</sup> **<sup>a</sup>***<sup>k</sup>* is computed by the iteration given by

$$\mathbf{a}\_{k+1} = \mathbf{a}\_k + \left[ F'(\mathbf{a}\_k)^T F'(\mathbf{a}\_k) + \lambda I \right]^{-1} F'(\mathbf{a}\_k)^T \left( \mathbf{U} - \mathbf{F}(\mathbf{a}\_k) \right), \tag{23}$$

where *F* (**a**) is the Jacobian matrix, which contains derivatives of **F**(**a**) with respect to **a**, and the parameter *λ* is nonnegative. By modifying the original algorithm according to Ref. [32], our algorithm of the Levenberg–Marquardt method, which we call Algorithm 1, is described below.

#### **Algorithm 1:** Levenberg–Marquardt (LM)

1. Set *k* = 0 and *λ* = 1. 2. Calculate **F**(**a***k*) and *F* (**a***k*). 3. Calculate *<sup>S</sup>*(**a***k*) = **<sup>d</sup>***T***d**, where **<sup>d</sup>** <sup>=</sup> **<sup>U</sup>** <sup>−</sup> **<sup>F</sup>**(**a***k*). 4. Prepare *A* = *F* (**a***k*)*TF* (**a***k*) and **v** = *F* (**a***k*)*T***d**. 5. Find *δ* from (*A* + *λI*)*δ* = −**v**. 6. Obtain *<sup>S</sup>*(**a***<sup>k</sup>* <sup>+</sup> *<sup>δ</sup>*) and *<sup>R</sup>* <sup>=</sup> *<sup>S</sup>*(**a***<sup>k</sup>* )−*S*(**a***k*+*δ*)

<sup>−</sup>*δT*(2**v**+*Aδ*) . 7. If *R* < 0.25, then set *ν* = 10 (*α<sup>c</sup>* < 0.1), *ν* = 1/*α<sup>c</sup>* (0.1 ≤ *α<sup>c</sup>* ≤ 0.5), or *ν* = 2 (*α<sup>c</sup>* > 0.5), where *α<sup>c</sup>* = <sup>2</sup> − (*S*(**a***<sup>k</sup>* + *<sup>δ</sup>*) − *<sup>S</sup>*(**a***k*)) /*δT***<sup>v</sup>** −<sup>1</sup> . If *<sup>R</sup>* < 0.25 and *<sup>λ</sup>* = 0, set *<sup>λ</sup>* = 1/ *<sup>A</sup>*−<sup>1</sup> and *<sup>ν</sup>* = *<sup>ν</sup>*/2. In the case of *R* < 0.25, we set *λ* = *νλ*. If *R* > 0.75, then we set *λ* = *λ*/2. If *R* > 0.75 and *<sup>λ</sup>* < 1/ *<sup>A</sup>*−<sup>1</sup> , set *<sup>λ</sup>* = 0. Otherwise when 0.25 ≤ *<sup>R</sup>* ≤ 0.75, no update for *<sup>λ</sup>*. 8. If *S*(**a***<sup>k</sup>* + *δ*) ≥ *S*(**a***k*), then return to Step 5. 9. If *S*(**a***<sup>k</sup>* + *δ*) < *S*(**a***k*), set **a***k*+<sup>1</sup> = **a***<sup>k</sup>* + *δ*. Then put *k* + 1 → *k* and go back to Step 2. Repeat the above procedure until one of the stopping criteria *<sup>δ</sup>* < <sup>10</sup>−<sup>4</sup> and *<sup>S</sup>* < <sup>10</sup>−<sup>14</sup> is fulfilled.

#### *2.3. Inverse Problems by Markov Chain Monte Carlo*

For simplicity in this section, we describe the Metropolis–Hastings Markov chain Monte Carlo algorithm (MH-MCMC) assuming *a* is the scalar appearing in Section 3.2. The extension of the derivation to the general case of vector **a** is straightforward.

Suppose that the coefficient *μ<sup>a</sup>* is unknown and *μ<sup>a</sup>* depends on a parameter *a*. We will reconstruct *a* within the framework of the Bayesian inversion algorithm [33,34], i.e., we will find the probability distribution *π*(*a*|**U**) of *a* for measured data **U**. Let *f*prior(*a*) be the prior probability density and *π*(**U**|*a*) be the likelihood density or the conditional probability density of **U** for *a*. Then the joint probability density *π*(*a*, **U**) of *a*, **U** is given by *π*(*a*, **U**) = *π*(**U**|*a*)*f*prior(*a*). According to the Bayes formula, the conditional probability density *π*(*a*|**U**) is given by

$$\pi(a|\mathbf{U}) = \frac{L(\mathbf{U}|a) f\_{\text{prior}}(a)}{\int\_{-\infty}^{\infty} L(\mathbf{U}|a) f\_{\text{prior}}(a) \, da} \,\tag{24}$$

where *L*(**U**|*a*) is a function proportional to *π*(**U**|*a*). Assuming Gaussian noise, we put

$$L(\mathbf{U}|a) = \mathfrak{e}^{-\frac{1}{2\nu\_{\mathcal{E}}^2} \|\mathbf{U} - \mathbf{F}(a)\|\_2^2}. \tag{25}$$

In this paper, we simply set *f*prior(*a*) = 1, i.e., we can say *f*prior(*a*) ∝ **1**[*a*min,*a*max](*a*) and the interval [*a*min, *a*max] is large enough so that all *a*'s appearing in the Markov chain fall into this interval. Here, **1***A*(*a*) is the indicator function defined as **1***A*(*a*) = 1 for *a* ∈ *A* and = 0 for *a* ∈/ *A*. General uniform distributions can be used for *f*prior(*a*) if we use the prior-reversible proposal that satisfies *f*prior(*a*)*q*(*a* |*a*) = *f*prior(*a* )*q*(*a*|*a* ) (see below for the proposal distribution *q*(*a* |*a*)) [35]. Another possible choice of *f*prior(*a*) is a Gaussian distribution, which turns out to be the Tikhonov regularization term in the cost function.

Using the Metropolis–Hastings algorithm, we can evaluate *π*(*a*|**U**) even when the normalization factor is not known and only the relation *π*(*a*|**U**) ∝ *L*(**U**|*a*)*f*prior(*a*) is available [33,34]. We can find *π*(*a*|**U**) using a sequence *p*1, *p*2, . . . as

$$\pi(a|\mathbf{U}) = \lim\_{k \to \infty} p\_k(a),\tag{26}$$

where *pk*<sup>+</sup>1(*a*) is obtained from *pk*(*a*) (Markov chain) as

$$p\_{k+1}(a') = \int\_{-\infty}^{\infty} K(a', a) \, p\_k(a) \, da. \tag{27}$$

For all *a*, *a* , the transition kernel satisfies

$$\mathbb{K}(a',a) \ge 0, \qquad \int\_{-\infty}^{\infty} \mathbb{K}(a',a) \, da' = 1. \tag{28}$$

Let us write *K*(*a* , *a*) as

$$\mathcal{K}(a',a) = a(a',a)q(a'|a) + r(a)\delta(a'-a),\tag{29}$$

where *q*(*a* |*a*) is the proposal distribution and

$$\sigma(a) = 1 - \int\_{-\infty}^{\infty} a(a', a) q(a'|a) \, da'. \tag{30}$$

In the Metropolis–Hastings algorithm we set *α*(*a* , *a*) = min {1, *π*(*a* |**U**)*q*(*a*|*a* )/[*π*(*a*|**U**)*q*(*a* |*a*)]}. With this choice of *α*(*a* , *a*), the detailed balance is satisfied and *K*(*a*, *a* )*π*(*a* |**U**) = *K*(*a* , *a*)*π*(*a*|**U**). A common choice of *q*(*a* |*a*) is the normal distribution, i.e., *<sup>q</sup>*(·|*a*) = N (*a*,*ε*2) with the mean *<sup>a</sup>* and the standard deviation *ε* > 0. We note that *q*(*a* |*a*) = *q*(*a*|*a* ) in this case. We have

$$\int\_{-\infty}^{\infty} h(a)\,\pi(a|\mathbf{U}) \, da = \lim\_{k\_{\text{max}} \to \infty} \frac{1}{k\_{\text{max}}} \sum\_{k=1}^{k\_{\text{max}}} h(a\_k), \tag{31}$$

where *ak* ∼ *pk*(·) and *h* is an arbitrary continuous bounded function.

Simulated annealing [36] is a type of the Metropolis–Hastings MCMC algorithm in which the *temperature σ<sup>e</sup>* decreases during the simulation. The algorithm is summarized below as Algorithm 2. In this paper we consider two temperatures.

**Algorithm 2:** Two-temperature simulated annealing (SA)

1. Set large *σ<sup>e</sup>* as a high temperature.

2. Generate *<sup>a</sup>* ∼ *<sup>q</sup>*(·|*ak*) = N (*ak*,*ε*2) with *<sup>ε</sup>* > 0 for given *ak*.

3. Calculate *α*(*a* , *ak*) = min {1, *π*(*a* |**U**)/*π*(*ak*|**U**)}.


Now we propose an MCMC-iterative hybrid method by combining Algorithms 1 with the Metropolis–Hastings Markov chain Monte Carlo. We first run the Markov chain Monte Carlo. Although the Metropolis–Hastings MCMC eventually gives the correct solution after about 10,000 steps when the chain reaches the steady state, we stop the Monte Carlo simulation way before the burn-in time. At the *kb*th Monte Carlo step after initial rapid changes ceases, we record the obtained reconstructed value *akb* and switch to Algorithm 1 with the initial value the recorded *akb* . It is important that *kb* is chosen after the initial rapid changes although *kb* can be a significant way from the burn-in

time. Otherwise, the final reconstructed results are quite robust against the choice of *kb*. We refer to the following hybrid algorithm as Algorithm 3.

#### **Algorithm 3:** Hybrid


We close this subsection by the Gelman-Rubin convergence diagnostic, which uses the intra-chain variance and inter-chain variance [37,38]. We run *M*MC different chains with different initial values. Let *a<sup>m</sup> <sup>k</sup>* denote the *k*th value in the *m*th chain (*k* = 1, ... , *kb*, *m* = 1, ... , *M*MC). We discard the first *ka* − 1 steps before the initial rapid changes cease. Then we compute the following intra-chain average and variance.

$$\mathfrak{d}\_{m} = \frac{1}{k\_{b} - k\_{a} + 1} \sum\_{k=k\_{a}}^{k\_{b}} a\_{k}^{\mathfrak{m}}, \quad \sigma\_{\mathfrak{m}}^{2} = \frac{1}{k\_{b} - k\_{a}} \sum\_{k=k\_{a}}^{k\_{b}} (a\_{k}^{\mathfrak{m}} - \mathfrak{d}\_{m})^{2}. \tag{32}$$

Next we compute the inter-chain mean and variance.

$$d = \frac{1}{M\_{\rm MC}} \sum\_{m=1}^{M\_{\rm MC}} d\_{m}, \quad B = \frac{k\_b - k\_a + 1}{M\_{\rm MC} - 1} \sum\_{m=1}^{M\_{\rm MC}} (d\_{\rm W} - d)^2. \tag{33}$$

Now we introduce

$$\hat{\mathcal{V}} = \frac{k\_b - k\_d}{k\_b - k\_d + 1} \mathcal{W} + \frac{1}{k\_b - k\_a + 1} B\_\prime \tag{34}$$

where *W* = <sup>1</sup> *<sup>M</sup>*MC <sup>∑</sup>*M*MC *<sup>m</sup>*=<sup>1</sup> *<sup>σ</sup>*<sup>2</sup> *<sup>m</sup>*. This is an unbiased estimator of the true variance. But *W* is also an unbiased estimate of the true variance if the chains converge. Therefore, we have *V*<sup>ˆ</sup> /*<sup>W</sup>* <sup>≈</sup> 1 if converged. Below, we will see that we can choose *kb* for which *V*<sup>ˆ</sup> /*<sup>W</sup>* is not close to 1. In this paper, we set *M*MC = 10.

#### *2.4. TRS Measurements of a Polyurethane-Based Phantom*

In this section, we consider time-resolved measurements for a phantom. The solid phantom (biomimic optical phantom) is made of polyurethane to simulate biological tissues (INO, Quebec, QC, Canada). The absorption coefficient and reduced scattering coefficient are *μ<sup>a</sup>* = 0.0209 mm−<sup>1</sup> and *μ <sup>s</sup>* = 0.853 mm−<sup>1</sup> at wavelength 800 nm. The refractive index of the phantom is *n* = 1.51. The measurements were performed by the TRS (time-resolved spectroscopy) instrument (TRS-80, Hamamatsu Photonics K.K., Hamamatsu, Japan). Two optical fibers from TRS-80 are attached on the top of the phantom with the separation 13 mm (*Ms* = *Md* = 1). The phantom is illuminated by near-infrared light of the wavelength 760 nm through one optical fiber and the reflected light is detected by the other optical fiber. The time interval of the measured data is 10 ps and we used the data from *t*<sup>1</sup> = 2.00 ns to *tMt* = 8.00 ns (*Mt* = 601). Measured counts, i.e., the number of photons, are shown in Figure 1. The upper panel of Figure 1 shows the instrument response function (IRF), which is given as a property of the experimental device. The measured reflected light is shown in the lower panel of Figure 1. The function *q*(*t <sup>k</sup>*) is the instrument response function divided by the maximum value of the measured reflected light. The peak of *q* is at 2.56 ns. {*U*}11*<sup>k</sup>* = *y*11*<sup>k</sup>* is the measured reflected light normalized by its maximum value. In the lower panel of Figure 1, *t*<sup>1</sup> and *tMt* are marked by vertical dashed lines. The size of the phantom is large enough that we can assume the three-dimensional half

space. Then the energy density of the detected light is computed by Equation (6). The parameter is obtained from Equation (3).

**Figure 1.** Measured data from TRS-80. The instrument response function and measured reflected light are shown in the upper and lower panels, respectively. In the lower panel, the dashed lines show *t*<sup>1</sup> = 2.00 ns and *tMt* = 8.00 ns (*Mt* = 601).

#### *2.5. Numerical Phantom*

To examine when the iterative scheme fails by being trapped by a local minimum and how the Markov chain Monte Carlo is capable of escaping from such local minima, a toy model is devised which is simple enough to explicitly understand the structure of the cost function but is complicated enough that the cost function has one local minimum and one global minimum.

We consider diffuse optical tomography in the two-dimensional space. In our toy model we suppose that the diffusion coefficient *D* is constant everywhere in the medium but there is absorption inhomogeneity and the absorption coefficient *δμ<sup>a</sup>* is unknown. Moreover, we assume that *δμa*(*ρ*) is given by

$$
\delta\mu\_a(\mathbf{p}) = \eta f\_a(\mathbf{x})\delta(y - y\_0),
\tag{35}
$$

where *η*, *y*<sup>0</sup> are given positive constants. Here, *fa*(*x*) is given by

$$f\_a(x) = \left[a^3 + 3\left(1 + \frac{\tanh x^2}{10}\right)a^2\right] \left(1 - \tanh x^2\right),\tag{36}$$

where *a* > 1.1 is a constant. Thus, *a* is the parameter to be reconstructed in our toy inverse problem. As is shown in Figure 2, the function *fa*(*x*) has one peak at *x* = 0 and the maximum value is *fa*(0) = *<sup>a</sup>*2(*<sup>a</sup>* + <sup>3</sup>), *fa* monotonically decays for |*x*| > 0, and *fa* → 0 as |*x*| → <sup>∞</sup>.

Now we describe how the forward data is computed. The unit of length and unit of time are taken to be mm and ps, respectively. On the *x*-axis, we place two sources (*Ms* = 2) at *ρ*<sup>1</sup> *<sup>s</sup>* = (−20, 0), *ρ*2 *<sup>s</sup>* = (20, 0) and three detectors (*Md* = 3) at *ρ*<sup>1</sup> *<sup>d</sup>* = (−40, 0), *<sup>ρ</sup>*<sup>2</sup> *<sup>d</sup>* = (0, 0), *<sup>ρ</sup>*<sup>3</sup> *<sup>d</sup>* = (40, 0). We set *μ <sup>s</sup>* = 1, *μ<sup>a</sup>* = 0.02, *n* = 1.37. Suppose that there is absorption inhomogeneity at depth 5. For *δμa*, we put *η* = 300/*c*, *y*<sup>0</sup> = 5, and

$$a = 1.5.\tag{37}$$

To distinguish, hereafter the true value of *a* is denoted by *a*¯. We assume 3% Gaussian noise and give the measured data as

$$y\_{i\backslash k} = u(\rho\_{d'}^{j}, t^{k}; \rho\_s^{i}; d)(1 + c\_{i\backslash k}),\tag{38}$$

where *eijk* ∼ N (0, 0.032). In this numerical experiment, *Mt* = 500 and *<sup>t</sup> <sup>k</sup>*+<sup>1</sup> <sup>−</sup> *<sup>t</sup> <sup>k</sup>* = *t* <sup>1</sup> = 5 (*k* = 1, . . . , *Mt* − 1).

**Figure 2.** The function *fa*(*x*) in Equation (36) is plotted for *a* = 1.3, 1.5, and 1.7.

By substituting the form (35) of *δb* in (21), we can express the energy density as

$$\begin{split} u(\boldsymbol{\rho}^{j}\_{d'}, \boldsymbol{t}^k; \boldsymbol{\rho}^i\_s, \boldsymbol{a}) &= u(\boldsymbol{\rho}^j\_{d'}, \boldsymbol{t}^k; \boldsymbol{\rho}^i\_s) \\ &= u\_0(\boldsymbol{\rho}^j\_{d'}, \boldsymbol{t}^k; \boldsymbol{\rho}^i\_s) \exp\left[ -\frac{\eta \boldsymbol{e}^{-\mu\_{d0} \boldsymbol{t}^k}}{u\_0(\boldsymbol{\rho}^j\_{d'}, \boldsymbol{t}^k; \boldsymbol{\rho}^i\_s)} \int\_0^{\boldsymbol{t}^k} g(0, \boldsymbol{t}^k; y\_{0'}, \boldsymbol{s}) \\ &\times g(\boldsymbol{y}\_{0'}, \boldsymbol{s}; 0, 0) \left( \int\_{-\infty}^{\infty} f\_{\boldsymbol{a}}(\boldsymbol{x'}) e^{-\frac{(\boldsymbol{x'}\_d - \boldsymbol{x'})^2}{4\boldsymbol{\Omega}c(\boldsymbol{t}^k - s)}} e^{-\frac{(\boldsymbol{x'} - \boldsymbol{x'})^2}{4\boldsymbol{\Omega}c}} \, d\boldsymbol{x'} \right) \, d\boldsymbol{s} \right], \end{split} \tag{39}$$

where

$$u\_0 = \frac{e^{-\mu\_{d0}ct^k}}{2\pi Dt^k} e^{-\frac{(\frac{\bar{r}\_d^j}{\ell} - \bar{s}\_s^i)^2}{4Dct^k}} \left[1 - \frac{\sqrt{\pi Dct^k}}{\ell} e^{\left(\frac{\sqrt{Dct^k}}{\ell}\right)^2} \text{erfc}\left(\frac{\sqrt{Dct^k}}{\ell}\right)\right].\tag{40}$$

#### **3. Results**

#### *3.1. Determination of Optical Properties*

Here we consider reconstructions from measured data in the phantom experiment. Let us first consider initial guesses below.

$$\text{Case 1:} \qquad \mu\_d = 0.01 \,\text{mm}^{-1}, \quad \mu\_s' = 1.0 \,\text{mm}^{-1}. \tag{41}$$

In this Case 1, the following *μa*, *μ <sup>s</sup>* are obtained both by Algorithm 1 (LM) and Algorithm 3 (hybrid).

$$
\mu\_a = 0.016 \,\text{mm}^{-1}, \quad \mu\_s' = 0.63 \,\text{mm}^{-1}. \tag{42}
$$

The results for *μ<sup>a</sup>* and *μ <sup>s</sup>* are shown in Figures 3 and 4, respectively. In Figure 3, reconstructed values of *μ<sup>a</sup>* are plotted against the iteration number *k*. In the top panel of Figure 3, we see that Algorithm 1 (LM) quickly converges. In Algorithms 2 and 3, we set *kb* = 99. As is seen in the middle panel of Figure 3, the convergence of Algorithm 2 (SA) is slow. The *temperature* is decreased from *σ<sup>e</sup>* = 10−<sup>6</sup> to *σ<sup>e</sup>* = 10−<sup>7</sup> at the *kb*th Monte Carlo step. Similarly, *ε* is changed from 0.1 to 0.001. Algorithm 2 returns the correct values after many Monte Carlo steps; we have *V*ˆ /*W* = 1.02, 1.06 for *μa*, *μ <sup>s</sup>*, respectively, when *ka* = 9000 and *kb* = 10000. Finally, in the bottom panel of Figure 3, we see that the iteration of Algorithm 3 (hybrid) immediately converges after we switch from the Metropolis–Hastings MCMC (*σ<sup>e</sup>* = 10<sup>−</sup>6, *ε* = 0.1) to Algorithm 1 (LM). No convergence of the Monte Carlo chain is required, and we found *V*ˆ /*W* = 8.6 for *μ<sup>a</sup>* and = 11.0 for *μ <sup>s</sup>* (*ka* = 49, *kb* = 99). A similar behavior is observed for the reconstruction of *μ <sup>s</sup>* in Figure 4. In the top panel of Figure 4, we find that Algorithm 1 (LM) works best

and converges after a few iterations, whereas Algorithm 2 (SA) in the middle panel of Figure 4 has slow convergence and reconstructed values around at *kb* = 99 are still away from *μ <sup>s</sup>* in (42). By switching from MH-MCMC to Algorithm 1 (LM) using Algorithm 3 (hybrid), convergence is easily obtained as shown in the bottom panel of Figure 4.

Now we start the simulation by setting the following initial values.

$$\text{Case 2:} \qquad \mu\_a = 0.5 \,\text{mm}^{-1}, \quad \mu\_s' = 1.0 \,\text{mm}^{-1}. \tag{43}$$

In this Case 2, Algorithm 1 (LM) fails and returns *μ<sup>a</sup>* = 0.068 mm−<sup>1</sup> and *μ <sup>s</sup>* = 1.75 mm−<sup>1</sup> whereas Algorithm 3 (hybrid) still gives the correct values. The reconstructed values of *μ<sup>a</sup>* and *μ <sup>s</sup>* at each iteration are shown in Figures 5 and 6, respectively. In Figure 5, the vertical axes of the left three panels are from 0 to 0.6 whereas the vertical axes of the right three panels are from 0 to 0.08. In Figure 6, the vertical axes of the left three panels are from 0 to 2 while the vertical axes of the right three panels are from 0 to 1.3. In the top panels of Figure 5, we see that the reconstruction of *μ<sup>a</sup>* is unsuccessful by Algorithm 1 (LM). Algorithm 2 (SA) approaches *μ<sup>a</sup>* in Equation (42) but still deviates from that value in the middle panels of Figure 5. As shown in the bottom panels of Figure 5, Algorithm 3 (hybrid) converges in a few iterations after switching to Algorithm 1 (LM). In the top panels of Figure 6, Algorithm 1 (LM) fails to reconstruct *μ <sup>s</sup>*. In the middle panels of Figure 6, reconstructed values by Algorithm 2 (SA) come close to *μ <sup>s</sup>* in Equation (42) but suffer from slow convergence. In the bottom panels of Figure 6, we see that Algorithm 3 (hybrid) arrives at *μ <sup>s</sup>* in Equation (42) without any problem.

**Figure 3.** Case 1: Reconstructed *μ<sup>a</sup>* by (top) Algorithm 1, (middle) Algorithm 2, and (bottom) Algorithm 3. The dashed lines show *μ<sup>a</sup>* in Equation (42).

**Figure 4.** Case 1: Reconstructed *μ <sup>s</sup>* by (top) Algorithm 1, (middle) Algorithm 2, and (bottom) Algorithm 3. The dashed lines show *μ <sup>s</sup>* in Equation (42).

**Figure 5.** (Left) Case 2: Reconstructed *μ<sup>a</sup>* by (top) Algorithm 1, (middle) Algorithm 2, and (bottom) Algorithm 3. The dashed lines show *μ<sup>a</sup>* in Equation (42). (Right) Same as the left panel but the vertical axes are from 0 to 0.08.

**Figure 6.** (Left) Case 2: Reconstructed *μ <sup>s</sup>* by (top) Algorithm 1, (middle) Algorithm 2, and (bottom) Algorithm 3. The dashed lines show *μ <sup>s</sup>* in Equation (42). (Right) Same as the left panel but the vertical axes are from 0 to 1.3.

The initial guesses for Case 1 are reasonably close to the values found in Equation (42). However, we started with initial guesses which are quite different from the above-mentioned values in Case 2. It is not surprising that Algorithm 1 (LM) does not work for Case 2, but Algorithm 3 (hybrid) can arrive at the correct values. The numerical calculations were performed on Matlab (i7-8700 CPU, 16 GB memory). In the hybrid method, the Metropolis–Hastings MCMC does not reach the steady state at about 100 steps but can be switched to the Levenberg–Marquardt method. For Figures 5 and 6, the computation time was 5 s. The simulated annealing method returns the correct value after about 10,000 steps, but it takes 8 min. The computation for Algorithm 1 (LM) stopped in 0.5 s.

Below we summarize the reconstructed values on Table 1. Although Algorithm 2 (SA) and Algorithm 3 (hybrid) return the same results after a long time, the hybrid scheme is about ten times faster, and moreover there is no need for choosing the low temperature for the latter algorithm. For Algorithm 3, it is not necessary to wait until the burn-in time, but it is enough if the initial rapid change ceases. In Section 3.2, we illustrate that Algorithm 3 works once the Monte Carlo simulation escapes from a local minimum and the algorithm does not require that the calculation reaches the steady state.


**Table 1.** Reconstructed values of *μa*, *μ <sup>s</sup>* are shown for Case 1 and Case 2. The units of *μa*, *μ <sup>s</sup>* are both mm<sup>−</sup>1. For Algorithm 2, values at 120th Monte Carlo step are shown.

#### *3.2. Determination of Absorption Inhomogeneity*

We perform a numerical experiment of diffuse optical tomography using the toy model. Let us consider when the iterative scheme fails. We see *F* (0) = *<sup>∂</sup> <sup>∂</sup><sup>a</sup> <sup>u</sup>*(*ρ<sup>j</sup>* , *t <sup>k</sup>*; *ρ<sup>i</sup>* ; *a*) *<sup>a</sup>*=<sup>0</sup> <sup>=</sup> 0. For sufficiently small > 0, we have *F* () > 0, *F* (−) < 0, |*F* (±)| 0, and *U* − *F*(±) < 0. Therefore, if we start the iteration from *a*<sup>0</sup> = , we obtain

$$\begin{array}{ccccc} \mid a\_1 \mid < \text{ } \mathfrak{e}\_{\prime} & \mid a\_2 \mid < \text{ } \vert a\_1 \vert , & \text{ } \dots \text{ } \vert \end{array} \tag{44}$$

Thus, the sequence *ak* approaches 0 and can never arrive at *a*¯ (= 1.5).

Let us consider how the difference *U* − *F*(*a*) depends on *a*. We introduce

$$h(t) = \frac{1}{t} \exp\left(-\frac{y\_0^2}{4Dct}\right) \left[1 - \frac{\sqrt{\pi Dct}}{\ell} e^{\left(\frac{y\_0}{2\sqrt{Dct}} + \frac{\sqrt{Dct}}{\ell}\right)^2} \text{erfc}\left(\frac{y\_0}{2\sqrt{Dct}} + \frac{\sqrt{Dct}}{\ell}\right)\right].\tag{45}$$

The following form is obtained using Equation (39). By neglecting noise, we have

$$\begin{aligned} &u(\rho^j, t^k; \rho^i; \mathbf{d}) - u(\rho^j, t^k; \rho^i; a) \\ &= \frac{\eta \varepsilon^{-\mu\_{\rm{off}} \varepsilon^k}}{(2\pi D)^2} \int\_0^{t^k} h(t^k - s) h(s) \left[ \int\_{-\infty}^\infty d\_\mathbf{d}(a, x') \left(1 - \tanh x'^2\right) e^{-\frac{(x\_d^j - s')^2}{4Dc(t^k - s)}} e^{-\frac{(x' - x\_i^j)^2}{4Dcs}} \, d\mathbf{x}' \right] \, ds\_\prime \end{aligned} \tag{46}$$

where *da*¯(*a*, *x* ) = *ξ*(*a*¯, *x* ) − *ξ*(*a*, *x* ) with

$$\xi(a, x') = a^2 \left[ a + 3 \left( 1 + \frac{\tanh x'^2}{10} \right) \right]. \tag{47}$$

For a given *x* , the function |*da*¯(*a*; *x* )| <sup>2</sup> has one global minimum at *a* = *a*¯, one local minimum at *a* = −2 <sup>1</sup> <sup>+</sup> tanh *<sup>x</sup>*<sup>2</sup> <sup>10</sup> , and one local maximum at *a* = 0. Therefore, the above expression implies that Algorithm 1 (LM) fails when the initial value *a*<sup>0</sup> is negative and the correct value *a*¯ = 1.5 is reconstructed only for a positive initial guess. Indeed, the computation ends up with −2.05 if we start the iteration from *a*<sup>0</sup> = −0.1 (see below) or −0.01, and the value 1.68 is obtained when *a*<sup>0</sup> = 0.01. Figure 7 shows |*da*¯(*a*, *x*)| <sup>2</sup> for *a*¯ = 1.5 and tanh(*x*2) = 0.5.

In Figure 8, we plot reconstructed values of *a* against iteration numbers. The initial value is set to *a*<sup>0</sup> = −0.1. The reconstruction by Algorithm 1 (LM) is shown in the top panel of Figure 8. As we can predict from Figure 7, Algorithm 1 (LM) converges to the local minimum and can never arrive at the global minimum. Monte Carlo simulation can jump over the local maximum and approach the global minimum, but keeps fluctuating as shown in the middle panel of Figure 8. We initially set (*σe*,*ε*)=(10−6, 0.5) for Algorithms 2 and 3. In Algorithm 2, we set (*σe*,*ε*)=(10−7, 0.005) after the *temperature* decreases at the *kb*th Monte Carlo step. In the bottom panel of Figure 8, Algorithm 3 (hybrid) successfully arrives at the global minimum. Using Matlab, the computation time was 17 min while we needed 3 hr for Algorithm 2 (SA) (1000 steps).

**Figure 7.** The function |*da*¯(*a*, *x*)| <sup>2</sup> (*a*¯ = 1.5) is plotted when tanh(*x*2) = 0.5.

After the initial time with large jumps, it is found that we can set *kb* = 99 in our simulation of the MCMC-iterative hybrid method. The correct value of *a*¯ is reconstructed by Algorithm 3 (hybrid) even when the simulation experiences a local minimum. Starting from *akb* = 1.8257, the calculation stops at *a*<sup>∗</sup> = 1.6841 (*a*<sup>∗</sup> = *akb*+3). We note that the reconstructed *a*<sup>∗</sup> is not exactly 1.5 due to noise.

When the initial guess *a*<sup>0</sup> = −0.1 lies in the valley of the local minimum (see Figure 7), the reconstructed *a* by Algorithm 3 (hybrid) moves to the valley of the global minimum with the help of Monte Carlo simulation as shown in Figure 8, while the reconstructed *a* by Algorithm 1 (LM) falls to the local minimum as the nature of Newton-type iterative methods. For Algorithm 3 (hybrid), we obtained *a*∗ = 1.6841. There is a possibility that negative reconstructed values are obtained by Algorithm 2 (SA) and Algorithm 3 (hybrid) for the first a few iterations if different pseudo-random numbers are used. These negative reconstructed values, however, will turn positive and the behavior of reconstructed *a* by Algorithm 2 (SA) and Algorithm 3 (hybrid) is always more or less similar to the middle and bottom panels of Figure 8. For the initial guesses of *a*<sup>0</sup> = −0.01 and *a*<sup>0</sup> = 0.01, Algorithm 3 (hybrid) works without any problem and Algorithm 2 (SA) also returns a reasonable result after a sufficiently large number of iterations (results not shown).

**Figure 8.** Reconstructed *a*¯ (= 1.5) by (top) Algorithm 1 (LM), (middle) Algorithm 2 (SA), and (bottom) Algorithm 3 (hybrid) for the initial value *a*<sup>0</sup> = −0.1.

#### **4. Discussion**

In this paper, we have proposed a hybrid numerical scheme which uses Markov chain Monte Carlo in the first step and then uses an iterative method in the second step. We switch from MH-MCMC to LM by observing proposed parameter values. For the typical jump length of parameters in MH-MCMC,

*ε* = 0.1 was used in Section 3.1 and *ε* = 0.5 was chosen in Section 3.2. Although these values were set so that the MH-MCMC calculation was efficiently performed, other choices of *ε* are also possible. The proposed scheme is quite general and can be applied to different inverse problems in NIRS which are solved by iterative methods even when the forward problem must be solved fully numerically with finite difference method or finite element method. It is an interesting future study how the hybrid scheme can be extended to diffuse optical tomography, which has many unknowns.

More sophisticated algorithms than the Metropolis–Hastings Markov chain Monte Carlo used in this paper have been proposed to overcome slow convergence. The delayed rejection scheme reduces the net rejection rate [39]. In the adaptive Metropolis algorithm, parameters in the proposal distribution are adjusted during Monte Carlo steps [40]. The DRAM algorithm, which combines the above-mentioned two schemes, was also proposed [41]. Two-level MCMC algorithms [42,43] and a multi-level MCMC [23] have been investigated to improve the MCMC algorithm. Such Monte Carlo schemes might improve the first step of our hybrid method by finding the valley of the global minimum more easily.

Related to the Metropolis–Hastings Markov chain Monte Carlo algorithm, quantum annealing [44] has been developed in addition to simulated annealing. Aiming at escaping from local minima, brute-force search and genetic algorithm [45] are also well-known optimization algorithms. The introduction of these methods in NIRS may be found useful in the future.

For the clinical use of NIRS, it has been recognized from early days that finding absolute values of the absorption and scattering coefficients is important [2]. In Ref. [15], it is emphasized that the obtained absolute values highly depend on the starting values of the initial estimate for the study of the infant brain with their measurement system. By performing Markov chain Monte Carlo before using standard iterative schemes, we may automatically acquire good initial values. Such clinical application is a natural next step of our research.

**Author Contributions:** Conceptualization, M.M. and G.N.; methodology, Y.J. and M.M.; software, Y.J. and M.M.; formal analysis, Y.J.; investigation, Y.H.; data curation, Y.J. and Y.H.; writing—original draft preparation, M.M.

**Funding:** The first author (Y.J.) is supported by National Natural Science Foundation of China (No. 11971121). Y.H. and M.M. acknowledge support from Grant-in-Aid for Scientific Research (17H02081) of the Japan Society for the Promotion of Science. M.M. also acknowledges support from Grant-in-Aid for Scientific Research (17K05572) of the Japan Society for the Promotion of Science and from the JSPS A3 foresight program: Modeling and Computation of Applied Inverse Problems. G.N. was supported by Grant-in-Aid for Scientific Research (15K21766 and 15H05740) of the Japan Society for the Promotion of Science.

**Acknowledgments:** The authors appreciate Goro Nishimura for fruitful discussion on optical tomography and the use of his computer facilities at Hokkaido University. We thank Tatsunori Emi, Takeshi Iwasaki, and Tomoya Sugiyama for helping related experiments which stimulated the present work.

**Conflicts of Interest:** The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

#### **References**


c 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
