Statistics for Continuous Time Markov Chains, a Short Review

Esquível, Manuel L.; Krasii, Nadezhda P.

doi:10.3390/axioms14040283

Open AccessReview

Statistics for Continuous Time Markov Chains, a Short Review

by

Manuel L. Esquível

^1,*

and

Nadezhda P. Krasii

^2,3

¹

Department of Mathematics, NOVA School of Science and Technology and NOVA Math, Universidade Nova de Lisboa, 2829-516 Caparica, Portugal

²

NOVA Math, Universidade Nova de Lisboa, 2829-516 Caparica, Portugal

³

Department of Higher Mathematics, Don State Technical University, Gagarin Square 1, Rostov-on-Don 344000, Russia

^*

Author to whom correspondence should be addressed.

Axioms 2025, 14(4), 283; https://doi.org/10.3390/axioms14040283

Submission received: 28 February 2025 / Revised: 1 April 2025 / Accepted: 3 April 2025 / Published: 8 April 2025

Download

Browse Figure

Versions Notes

Abstract

:

This review article is concerned to provide a global context to several works on the fitting of continuous time nonhomogeneous Markov chains with finite state space and also to point out some selected aspects of two techniques previously introduced—estimation and calibration—relevant for applications and used to fit a continuous time Markov chain model to data by the adequate selection of parameters. The denomination estimation suits the procedure better when statistical techniques—e.g., maximum likelihood estimators—are employed, while calibration covers the case where, for instance, some optimisation technique finds a best approximation parameter to ensure good model fitting. For completeness, we provide a short summary of well-known important notions and results formulated for nonhomogeneous Markov chains that, in general, can be transferred to the homogeneous case. Then, as an illustration for the homogeneous case, we present a selected Billingsley’s result on parameter estimation for irreducible chains with finite state space. In the nonhomogeneous case, we quote two recent results, one of the calibration type and the other with more of a statistical flavour. We provide an ample set of bibliographic references so that the reader wanting to pursue her/his studies will be able to do so more easily and productively.

Keywords:

statistics of Markov chains; continuous time; finite state space; estimation; calibration

MSC:

60J27; 60J28; 65C40; 62M05

1. Introduction

In this review, we will consider Markov chains, in continuous time, with a finite state space. This kind of Markov chain has been used in many modelling studies in actuarial mathematics (see [1] for Markov and semi-Markov models and [2] for demography), in risk and credit management (see [3,4,5,6,7]), in social sciences (see [8] and the articles described in [9]), in biology and medicine (see [10,11,12,13] for simulation studies comparing discrete with continuous time Markov chains and applications to Alzheimer-type dementia, pathology in late life and to normal ageing of the brain in contrast to Alzheimer’s disease, and to ambulatory hypertension monitoring, respectively) and in many other areas.

Our main goal is to provide a summary description of the main ideas—with relevant bibliographic references—whenever these are solidly established and, in the case of nonhomogeneous Markov chains, to enter into the detail of some particular ideas and results susceptible of improvement and further research. As such, we view this contribution as a guide to at least part of the relevant literature on the subject. Let us briefly describe the organisation of this text.

In Section 2, we mention the main concepts and results on continuous time Markov chains that provide a context and reference for the remainder of this text.
Section 3, dealing with homogeneous Markov chains, is devoted to a brief presentation of the most important ideas on statistical inference for these chains, mainly for those having a finite state space.
Section 4 addresses the case of nonhomogeneous Markov chains, and, besides referencing several works on the subject, provides some details of two procedures which enable the determination of parameters of the intensity matrix; the procedure termed calibration covers the case were the departure point for the procedure is some family of discrete time probability transition matrices with numerical entries, and the procedure termed estimation considers data composed of the trajectories of a chain with an absorbing state and with intensities of arbitrary functional form depending on parameters.
In Section 5, we provide some comment on references on alternative estimation or calibration procedures for Markov chains.
In Section 6, we present some conclusions drawn from the presentation in this work.

2. Definitions, Notations and Results on Continuous Time Markov Chains

The diversity of the whole universe of Markov chains theories—discrete time, continuous time, homogeneous and nonhomogeneous, with a finite or with an infinite denumerable state space—is fascinating and since it provides sound foundations for modelling in various sciences, it remains a very active area of study. There are many very good references allowing for preliminary study and which provide more in-depth coverage of its most intricate features. Let us briefly detail a selection of these references. Classical introductory readings, although requiring a not insignificant amount of work are [14,15]. Very interesting introductions to the subject with applications in mind are [16,17,18]. For the most mathematically inclined reader, the two invaluable references [19,20] provide an important complement to the first two quoted references; for a deeper understanding of the many subtleties of the theory, the five following references are essential: [21,22,23,24,25].

We next review—following mainly [26,27,28]—the main concepts and results that will be needed to deal with some aspects of the statistical inference for a nonhomogeneous Markov chain process

{(Z_{t})}_{t \geq 0}

depending on some—possibly vectorial—parameters.

Definition 1

(Continuous time Markov chain). Let

I : = {1, 2, \dots, m}

be the finite set associated with

Θ = {θ_{1}, θ_{2}, \dots, θ_{m}}

. The stochastic process

{(Z_{t})}_{t \geq 0}

is a continuous time Markov chain with state space

I

(or Θ) if the following characteristic Markov property is verified; namely, for all

i_{0}, i_{1}, \dots i_{n} \in I

and

0 = t_{0} < t_{1} < \dots < t_{n} < \dots

, we have that,

\begin{matrix} P & [Z_{t_{n}} = i_{n} ∣ Z_{t_{n - 1}} = i_{n - 1}, \dots Z_{t_{1}} = i_{1}, Z_{t_{0}} = i_{0}] = \\ = P [Z_{t_{n}} = i_{n} ∣ Z_{t_{n - 1}} = i_{n - 1}] . \end{matrix}

We observe that, by force of the Markov property in Definition 1, the law of a continuous time Markov chain depends only on the following transition probabilities. Let

I

be the identity matrix with dimension

# I

, the Kronecker’s delta, given by:

δ_{i}^{j} = \{\begin{matrix} 0 & i \neq j \\ 1 & i = j . \end{matrix}

Definition 2

(Transition probabilities). Let

I : = {1, 2, \dots, m}

, the finite set associated with

Θ = {θ_{1}, θ_{2}, \dots, θ_{m}}

, be the state space of

{(Z_{t})}_{t \geq 0}

, the continuous time Markov chain. The transition probabilities are defined by the characteristic property,

\forall i, j \in I, s < t, p (s, i, t, j) = P [Z_{t} = j ∣ Z_{s} = i] a n d p (t, i, t, j) = δ_{i}^{j} .

Let

L (R^{# I})

be the space of the square matrices with coefficients in

R

. The transition probability matrix function

P : R_{+} \times R_{+} \mapsto L (R^{# I})

is defined by,

\forall i, j \in I, s < t, P (s, t) = {[p (s, i, t, j)]}_{i, j \in I} a n d P (t, t) = I .

(1)

The transition probabilities of Markov processes in general satisfy a very important functional equation that results from the Markov property.

Theorem 1

(Chapman–Kolmogorov equations). Consider a nonhomogeneous continuous time Markov chain as given in Definition 1. Let

P

be its transition probability matrix function as given in Definition 2. We then have:

\forall s, u, t, 0 \leq s < u < t, P (s, t) = P (s, u) P (u, t) .

(2)

The concept of a transition density is essential to the description of conditions allowing for the structural representation of continuous time Markov chains.

Definition 3

(Transition intensities). Let

L (R^{# I})

be the space of square matrices with coefficients in

R

. A function

Q : R \mapsto L (R^{# I})

denoted by:

Q (t) = {[q (t, i, j)]}_{i, j \in I},

is a transition intensity if the following characteristic set of properties are verified; that is, for almost all

t \geq 0

,

(i): $\forall i \in I, t \geq 0, q (t, i, i) \leq 0$ ;
(ii): $\forall i \in I, t \geq 0, q (t, i, j) - q (t, i, i) \geq 0$ ;
(iii): $\forall i \in I \sum_{j \in I} q (t, i, j) = 0$ .

Theorem 2

(Backward and Forward Kolmogorov equations). Suppose that the matrix transition probability

P (s, t)

is continuous at s, that is:

lim_{t ↓ 0} P (0, t) = I a n d lim_{t ↓ s} P (s, t) = lim_{t ↑ s} P (t, s) = I .

(3)

If there exists

Q

such that:

\begin{matrix} Q (t) & = lim_{k + h \to 0_{+}, k \equiv 0 \lor h \equiv 0} \frac{P (t - k, t + h) - I}{k + h} = lim_{h ↓ 0, h > 0} \frac{P (t, t + h) - I}{h} = \\ = lim_{k ↓ 0, k > 0} \frac{P (t - k, t) - I}{k}, \end{matrix}

(4)

then, we have the backward Kolmogorov (matrix) equation:

\frac{\partial}{\partial s} P (s, t) = - Q (s) P (s, t), P (s, s) = I,

(5)

and the forward Kolmogorov (matrix) equation:

\frac{\partial}{\partial t} P (s, t) = P (s, t) Q (s), P (t, t) = I .

(6)

Remark 1.

The general theory of Markov processes shows that the condition that

P (s, t)

is continuous in both s and t is sufficient to ensure the existence of the matrix intensities

Q

given in Formula (4) (see [26] (p. 232)). The determination of the most general conditions for the existence and unicity of solutions of the Kolmogorov equations is an active subject of research with remarkable advances in recent years (see the series of works [29,30,31]).

Corollary 1

(Hostinsky’s representation). Let

Q

be a transition intensity and the corresponding transition probability matrix

P

given by Formula (5) or Formula (6). If, for a function

k : [0, + \infty [\mapsto [0, + \infty [

the following condition is verified:

\forall i \in I, u \in [0, + \infty [, - q (u, i, i) \leq k (u) a n d \forall s, t \geq 0 \int_{s}^{t} k (u) d u < + \infty,

(7)

we then have,

\begin{matrix} P (s, t) = \\ = I + \int_{s}^{t} Q (u) d u + \sum_{n \geq 2} \int_{s}^{t} d s_{1} \int_{s_{1}}^{t} d s_{2} \dots \int_{s_{n - 1}}^{t} Q (s_{1}) Q (s_{2}) \dots Q (s_{n}) d s_{n}, \end{matrix}

(8)

and,

\begin{matrix} P (s, t) = \\ = I + \int_{s}^{t} Q (u) d u + \sum_{n \geq 2} \int_{s}^{t} d t_{1} \int_{s}^{t_{1}} d t_{2} \dots \int_{s}^{t_{n - 1}} Q (t_{1}) Q (t_{2}) \dots Q (t_{n}) d t_{n}, \end{matrix}

(9)

with the series being absolutely and uniformly convergent for any

s, t

in a compact interval of

[0, + \infty [

.

Note that Formula (9) is, in fact, an alternative representation of the product integral (see [32] (p. 25)), which is useful in the estimation theory of Markov chains as referred to in Section 5.

Under the condition stated in Formula (7), we have a structural representation of the continuous time Markov chain process

{(Z_{t})}_{t \geq 0}

(see [26] (p. 229) and also [25] (pp. 279–281)).

Theorem 3

(A discrete time representation of a continuous time Markov chain). Let the intensities satisfy the condition given by Formula (7) in Corollary 1. Then, there exists a sequence of stopping times

{(τ_{n})}_{n \geq 1}

—the times of the jumps of the Markov chain—such that with the sequence

{(Y_{n})}_{n \geq 1}

defined by

Y_{n} = Z_{τ_{n}}

, the process defined by:

Z_{t} = \sum_{n = 0}^{+ \infty} Y_{n} 1 I_{[τ_{n}, τ_{n + 1} [} (t) = \sum_{n = 0}^{+ \infty} Z_{τ_{n}} 1 I_{[τ_{n}, τ_{n + 1} [} (t)

(10)

is a continuous time Markov chain with transition probabilities

P

given by Definition 2 and transition intensities

Q

given by Definition 3 and Theorem 2. Moreover,

{(Y_{n})}_{n \geq 1}

is a Markov process.

Theorem 3 allows the definition of the imbedded Markov chain associated to a continuous time Markov chain

{(Z_{t})}_{t \geq 0}

, that is, a Markov chain process with intensity matrix

Q^{θ} = {[q {(i, j)}^{θ}]}_{i, j \in I}

. We state next the definition of the imbedded Markov chain associated to a continuous time Markov chain. In order to recover the notation of [33] (p. 37) (see also [34] (p. 267)), we consider the sequence of stopping times

{(τ_{n})}_{n \geq 1}

given by the jump times and then define the sequence

{(ρ_{n})}_{n \geq 1}

staying times, that is, the times in-between jumps by:

\{\begin{matrix} ρ_{1} : = τ_{1} \\ ρ_{n} : = τ_{n} - τ_{n - 1} & n \geq 2, \end{matrix}

(11)

in such a way that

τ_{n} = ρ_{1} + \dots + ρ_{n}

.

Definition 4

(The imbedded Markov chain). Given

{(Z_{t})}_{t \geq 0}

the continuous Markov chain process with intensity matrix

Q^{θ} = {[q {(i, j)}^{θ}]}_{i, j \in I}

such that

q {(i, j)}^{θ} > 0

and with the Markov process

{(Y_{n})}_{n \geq 1}

defined in Theorem 3, the discrete time Markov process

{(Y_{n}, ρ_{n})}_{n \geq 1}

with state space given by the Cartesian product of the state space of

{(Z_{t})}_{t \geq 0}

—and

{(Y_{n})}_{n \geq 1}

—by

[0, + \infty [

, is the imbedded Markov chain.

The imbedded Markov chain will be used in a result in Section 3 below.

We should also mention [35,36] for the important concept of—what should be called—the reverse imbedding problem; that is, given a discrete time Markov chain, the determination of a continuous time Markov chain such that the given discrete time Markov chain is the imbedded Markov chain in the continuous one. In [37], a discussion concerning the bridging between rigorous mathematical results on the existence of generators—tantamount to a solution to the existence of an imbedded Markov chain given some data—to its computational implementation is presented. A study on different forms of approximating continuous time Markov chains by discrete time ones is given in [38] using different Skorokhod topologies adapted to different embeddings: step functions, linear interpolation, multistep, etc.

3. Homogeneous Markov Chains

The statistical estimation of the parameters of continuous time homogeneous Markov chains is a well-studied subject for which there exist very important references. Let us refer first to the two highly praised, very important works [33,39] and a more recent one with important applications in mind [16]. The book [33] presents the maximum likelihood method for general Markov processes, first, in discrete time, and secondly, in continuous time, for processes that are not diffusions, by considering the imbedded Markov chain process in discrete time—see Definition 4—allowing the exploitation of the results previously obtained in the discrete time set up.

The article [39] contains an extensive list of references and a summary of the main ideas on the subject, which are clearly developed in the book by the same author. Noticing that the estimation procedures apply to irreducible Markov chains with only recurrent and non-null states, we select from [33] (pp. 45–51) the following fundamental result, Theorem 8.1, a result included in a section devoted to homogeneous Markov chains with finite state space. We note that for the notions of recurrence and transience in homogeneous Markov chains, an introductory reference is, for instance [17] (pp. 79–85).

Theorem 4

(Asymptotics of MLE estimators when the length of the one observed trajectory goes to infinity). Suppose that

{(Z_{t})}_{t \geq 0}

the Markov chain process with intensity matrix

Q^{θ} = {[q {(i, j)}^{θ}]}_{i, j \in I}

and parameter

θ = (θ_{1}, \dots, θ_{u}, \dots, θ_{d}) \in Θ \subset R^{d}

satisfies the following hypothesis.

1.: The trajectories of ${(Z_{t})}_{t \geq 0}$ are right continuous step functions; we have that $\sum_{j \in J} q {(i, j)}^{θ} > 0$ ; the transition probabilities satisfy Formula (3) and the intensities satisfy Formula (4) stated in Theorem 2.
2.: The set $D : = \{(i, j) : q {(i, j)}^{θ} > 0\}$ does not depend on θ; we have that $q {(i, j)}^{θ} \in C^{3} (Θ)$ and the matrix,

${[\frac{\partial}{\partial θ_{u}} q {(i, j)}^{θ}]}_{(i, j) \in D, 1 \leq u \leq d},$

has rank d for all $θ = (θ_{1}, \dots, θ_{u}, \dots, θ_{d}) \in Θ$ .
3.: For each $θ \in Θ$ , the Markov chain ${(Y_{n})}_{n \geq 1}$ has only one ergodic set and there are no recurrent states.

Then, with

t_{i j}

denoting the number of direct jumps from i to j in the sample function observed until time t, with

γ_{i}

denoting the total amount of time the chain is in state i until time t, there exists a consistent solution

\hat{θ} \in Θ

of the likelihood equations given by:

\frac{\partial}{\partial θ_{u}} L_{t} (θ) = \sum [t_{i j} \frac{\partial}{\partial θ_{u}} ln (q {(i, j)}^{θ}) - γ_{i} \frac{\partial}{\partial θ_{u}} q {(i, j)}^{θ}] = 0 .

Moreover, with

θ_{0} \in Θ

being the true parameter value and

ν (t) : = max {k : τ_{k} < t}

being the number of jumps that occur until time t, we have that:

(i): With the limit being taken in probability, ${lim}_{t \to + \infty} \sqrt{ν (t)} (θ_{0} - \hat{θ})$
(ii): With the limit being taken in probability, ${lim}_{t \to + \infty} \sqrt{ν (t)} (q {(i, j)}^{θ_{0}} - q {(i, j)}^{\hat{θ}})$
(iii): With the limit being taken in law (denoted by $L$ ):

$\begin{matrix} 2 (max_{θ \in Θ} L_{t} (θ) - L_{t} (θ_{0})) = \\ = \sum [t_{i j} \frac{\partial}{\partial θ_{u}} ln (\frac{q {(i, j)}^{\hat{θ}}}{q {(i, j)}^{θ_{0}}}) - γ_{i} \frac{\partial}{\partial θ_{u}} (q {(i, j)}^{\hat{θ}} - q {(i, j)}^{θ_{0}})] ⟶_{t \to + \infty}^{L} χ_{r}^{2} . \end{matrix}$

(12)

We observe that if the Markov chain has a sole trap or absorbing state, other estimation procedures have to be devised to take advantage of data composed of a significant number of trajectories all ending in the trap state after a finite number of jumps.

An extension of the regularity conditions needed for the more general estimation results in [33] is given in [40]. A study of the rate of convergence of the maximum likelihood estimator based on the observation of a trajectory, using the Gärtner–Ellis theorem for random processes, is given in [41].

An interesting follow-up appeared in [42] dealing with the situation where it may happen that when estimating the parameters of a stationary Markov chain, the chosen model is incorrect, in the sense that the probability law governing the evolution is not an element of the parametric family stipulated. Interesting convergence results are presented.

The article [43] covers the statistical estimation for Markov jump processes with finite state space in the context of observations at discrete time points—also termed partially observed Markov chains—by studying conditions for the existence and unicity of the maximum likelihood estimators. Examples and simulation studies are presented to illustrate the methodology.

The four works [44,45,46,47] deal with an important subject that may be summarily described as perturbation bounds for homogeneous Markov chains; the results on this subject—see Theorem 5 for an example in the nonhomogeneous case—have important applications, for instance, allowing for control of estimation procedures or for relative indifference in the choice of the intensity matrices.

4. Nonhomogeneous Markov Chains

In this section, we consider nonhomogeneous continuous time Markov chains with a finite state space. Whenever a parameter estimation, or calibration, problem for a model of observations is considered by means of nonhomogeneous continuous time Markov chains, we have to take into preliminary consideration three important facts.

Generally, we will try to determine the intensities (see Definition 3) that will generate the transition probabilities (see Definition 2), following the classical and very successful modelling approach, tantamount to finding a model of differential equations for the phenomena, in this case, the Kolmogorov equations. In actuarial mathematics, this methodology has been referred to as the Transition Intensity Approach (TIA) (see [48] and…).
The intensities may have arbitrary functional forms—linear, piecewise constant (see [49] (p. 42)), polynomial, exponential of Gompertz, Makeham or Gompertz–Makeham type (see [49] (pp. 24–25) or [50] (pp. 205–206)) used in health insurance and long-term care modelling—where the functional forms are dependent on parameters that must be fitted to the data either by estimation or calibration.
The characteristics of the available data relating to observations of phenomena, together with the general—and most often, the qualitative properties expected from the model—are determining for both the choice of the modelling approach and, in the case of modelling by the Kolmogorov equations—the choice of the functional form of the intensities and of the relevant parameters to be estimated or calibrated.

The constructive definition of a nonhomogeneous continuous time Markov chain follows [27] (pp. 242–246) and in a more condensed version [28] (pp. 349–351). Whenever we are faced with a specific estimation-calibration problem, a first decision that conditions all the subsequent decisions is the functional form of the intensities. If there are no imperative reasons justifying a preference of a functional form over others, the choice of the functional form is somehow indifferent by reason of the following. Two sets of intensities may be seen to be perturbations of one another and so results on how the perturbed intensities reflect the corresponding transition probabilities—such as the ones exposed in [51]—are extremely useful. The following result is an example, showing that if a distance between the correspondent probability matrices can be controlled by a distance of the intensity matrices, the choice of the functional form is indifferent.

Theorem 5

(Continuous dependence of the transition probabilities on the transition intensities). Let

∥\cdot∥

be a matrix norm and let

Q_{1} (t, θ)

and

Q_{2} (t, θ)

be two matrices of intensities norm-bounded by

M > 0

in

[0, T]

. Define:

ϵ (Q_{1}, Q_{2}) : = sup_{t \in [0, T], θ \in Θ} ∥Q_{1} (t, θ) - Q_{2} (t, θ)∥ .

(13)

Then, we have that,

sup_{t \in [x, T], θ \in Θ} ∥P_{1} (x, t, θ) - P_{2} (x, t, θ)∥ \leq ϵ (Q_{1}, Q_{2}) \frac{e^{M | T - x |} - 1}{M},

(14)

where

P_{1} (x, u, θ)

and

P_{2} (x, u, θ)

are the solutions of the Kolmogorov equations—given either in Formula (5) or in Formula (6)—with the matrices of intensities, respectively,

Q_{1} (t, θ)

and

Q_{2} (t, θ)

.

For a proof and an application of this result to a practical problem, see [52].

Remark 2

(Consequences and application of Theorem 5). One of the consequences of this result is that it gives an accrued freedom in the choice of the functional form of the transition intensities, since if two intensity matrices are close to one another—in the supremum norm in Formula (13)—then the correspondent probability function matrices will be close to one another in the same norm. The

x, T

dependent factor in Formula (14) is not a limitation since this constant, being equivalent to

| T - x |

for

T \approx x

, we can apply the result to a decomposition of the time interval

[0, T]

in a finite number of intervals in such a way that the distance of the probability function matrices is as small as is wanted.

In an alternative comparison result [53], the authors consider the asymptotic behaviour of the log-likelihood ratio as a measure of the deviation between a sequence of real valued random variables and a continuous-state non-homogeneous Markov chain. A class of strong deviation theorems for bivariate functions of the sequence of real valued random variables which are associated with the continuous-state nonhomogeneous Markov chains is established using the super-martingale limit theorem.

4.1. Calibration of Intensities of a Nonhomogeneous Markov Chain

In this section, we describe a methodology termed the calibration of intensities introduced for the purpose of a study in long-term care in [54], a methodology that was further detailed in [55] in a formulation that we reproduce next. Due to the type of applications in mind, the Markov chains under study are supposed to have a finite state space with only one cemetary state.

Let us consider the result presented in Theorem 6. Suppose that we observe a phenomenon in discrete time, for instance, the periodical—say, monthly—recordings of events that may occur in any day of the month. If the state space of the observations is finite or denumerable, the data may allow the phenomenon to be modelled by a discrete time Markov chain, although we know that a more adequate model would be a continuous time Markov chain, possibly even nonhomogeneous. The calibration procedure proposed allows fitting the parameters of a given functional form of intensities by minimising a discrepancy between the transition probabilities of the discrete time model and the transition probabilities of the continuous time model.

Theorem 6

(Calibration of intensities of a continuous chain by discrete observations). Let, for

1 \leq n \leq N

,

R^{τ_{n}} = {[r_{i j}^{(τ_{n})}]}_{i, j = 1, \dots, r}

be the generic element of a sequence of numerical transition matrices taken at a sequence of increasing dates

{(τ_{n})}_{1 \leq n \leq N}

. Consider a set of intensities

Q (u, λ) = {[q (u, i, j, λ)]}_{i, j = 1, \dots, r}

—with

λ \in Λ \subset R^{d}

being a parameter and Λ being a compact set—satisfying the following conditions:

1.: For every fixed λ, the functions $q (u, i, j, λ)$ are measurable as functions of u.
2.: For every fixed u, the functions $q (u, i, j, λ)$ are continuous as functions of λ.
3.: There exists a locally integrable function $M : [0, + \infty [\mapsto [0, + \infty [$ , such that for all $λ \in Λ$ , $i \in I$ , $u \in [0, + \infty [$ and $0 \leq s \leq t$ , the following conditions are verified:

$- q (u, i, i, λ) \leq M (u) a n d \int_{s}^{t} M (u) d u < + \infty .$

(15)

Consider the following.

1.: We know that there exists $P (s, t, λ) = {[p (s, i, t, j, λ)]}_{i, j = 1, \dots, r}$ a probability transition matrix, with entries absolutely continuous in s and t, such that the conditions in Definition 2, the Chapman–Kolmogorov equations in Theorems 1 and 2 are verified.
2.: For each fixed $s_{0}$ , we can consider the loss function given by,

$O (s_{0}, λ) : = \sum_{i, j = 1, \dots, r} \sum_{n = 1}^{N} {(p (s_{0}, i, τ_{n}, j, λ) - r_{i j}^{(τ_{n})})}^{2} .$

(16)

Then, we have that for the optimisation problem, ${inf}_{λ \in Λ} O (s_{0}, λ)$ , there exists $λ_{0} \in Λ$ such that,

$O (s_{0}, λ_{0}) = min_{λ \in Λ} O (s_{0}, λ),$

(17)

the unique minimum being attained at possibly several distinct points $λ_{0} \in Λ$ .

Proof.

See [55] for a proof. □

Remark 3

(On possible applications of Theorem 6). In [54], we applied the result to a sequence of transition matrices

{[(r_{i j}^{(n)}]}_{i, j = 1, \dots, r})_{n \geq 1}

derived from a single observed discrete time transition matrix

P

and the sequence of powers

P^{n}

, for

n \geq 2

corresponding to the transition in n steps. We observe that the solution of the optimisation problem in Formula (16) may require heavy computational power since it depends on the numerical solution of the Kolmogorov equations at each step, and the dimension of the optimisation space depends on the number of states and on the number of the intensity parameters to calibrate.

Remark 4

(An open problem related to the calibration procedure in Theorem 6). Let us suppose that the sequence of transition matrices

R^{τ_{n}} = {[r_{i j}^{(τ_{n})}]}_{i, j = 1, \dots, r}

, taken at a sequence of increasing dates

{(τ_{n})}_{1 \leq n \leq N}

, in the statement of Theorem 6 corresponds to a sequence of observations of an unknown continuous time nonhomogeneous Markov chain transition matrix. Let

λ_{0}^{N} \in Λ

be one solution of the optimisation problem given by Formula (17). An open relevant problem is to determine the conditions under which the sequence of transitions matrices given by

P (s, t, λ_{0}^{N}) = {[p (s, i, t, j, λ_{0}^{N})]}_{i, j = 1, \dots, r}

for

N \geq 1

, converges to the unknown continuous time nonhomogeneous Markov chain transition matrix.

4.2. Estimation of Nonhomogeneous Markov Chains

A useful approach to estimating the parameters of the non-constant intensities widely used in actuarial mathematics—in the modelling of health insurance and long-term care problems using Markov chains in a broad methodology termed multiple state models—essentially consists of two steps.

Initially, the intensities are supposed to be constant over a partition of the time interval under study in intervals; this interval may cover one year in the case where the age of the subject matters or cover a period of more than a year in the case where the variation in intensities over the period is assumed to be negligible. The estimation is performed in each interval using the effective methods of maximum likelihood for homogeneous continuous time Markov chains (see Section 3); for this first step, see [56] (pp. 683–690) for details and examples of applications.
Using additional data of the population for each specific age—such as gender, smoking habits, weight, exercise habits—perform a fitting of the intensity parameters by means of generalised linear models; this complementary methodology is termed graduation of the intensities. See [57] (pp. 126–128) and [58,59] for detailed explanations of the graduation and [60,61] for examples of applications.

Since its inception, this approach has been in use, with variants dealing with more specific problems, in several works; one such example is [62]. In [63], the focus is on the estimation of the annual transition probabilities incorporating external data to the time and jump counts.

In the rest of this section, we describe a methodology proposed and applied in [52] for the estimation of the parameters of the non-constant intensities of arbitrary functional form. The Markov chains under study are assumed to have a finite state space with an absorbing state. This methodology is based on a known algorithmic constructive procedure of nonhomogeneous Markov chains (see [27] (p. 266) and also [28] (pp. 349–351)). The constructive methodology allowing to recover a homogeneous Markov chain, with denumerable state space, is presented in [64] (pp. 473–475). Consider a transition intensity matrix

Q (t, θ)

with

θ \in Θ \subset R^{d}

a parameter that we intend to estimate.

Definition 5

(Constructive definition of a Markov chain). Consider the entries of a transition intensity matrix,

Q (t, θ) = {[μ_{i j}^{θ} (t)]}_{i, j = 1, \dots, d},

and define:

p^{★} (t, i, j) = \{\begin{matrix} \frac{1 - δ_{i}^{j}}{- μ_{i i}^{θ} (t)} μ_{i j}^{θ} (t) & μ_{i i}^{θ} (t) \neq 0 \\ δ_{i}^{j} & μ_{i i}^{θ} (t) = 0, \end{matrix}

(18)

where

δ_{i}^{j}

is the Kronecker’s delta. Suppose that

X_{0} = i

, according to some initial distribution on

{1, 2, \dots, d}

.

1.: We define by induction the jump sequence of the stopping times ${(τ_{n})}_{n \geq 0}$ as follows; firstly, we pose $τ_{0} \equiv 0$ .
2.: Then, the stopping time $τ_{1}$ , that is, the sojourn time in state i and also the time of the first jump, has an exponential distribution function given by:

$F_{τ_{1}} (t) = P [τ_{1} \leq t] = 1 - exp (\int_{0}^{t} μ_{i i}^{θ} (t) d u) .$

(19)

The exponential distribution for $τ_{1}$ is a consequence of a general result on the distribution of the sojourn times of a continuous time Markov chain (see Theorem 2.3.15 in [26] (p. 221)).
3.: Given that the process is in state i, it now may jump to state j at time $τ_{1} = s_{1}$ with probability $p^{★} (t, i, j)$ defined in Formula (5), that is,

$P [X_{s_{1}} = j ∣ τ_{1} = s_{1}, X_{0} = i] = p^{★} (s_{1}, i, j),$

(20)

and so we have that $X_{t} = i$ for $0 \equiv τ_{0} \leq t < τ_{1}$ .
4.: Given that $τ_{1} = s_{1}$ and $X_{s_{1}} = j$ , we have that $τ_{2}$ , the stopping time of the second jump, has, again, the exponential distribution function given by:

$F_{τ_{2} ∣ τ_{1} = s_{1}} (t) = P [τ_{2} \leq t ∣ τ_{1} = s_{1}] = 1 - exp (\int_{0}^{t} μ_{j j}^{θ} (u + s_{1}) d u),$

and we also have that,

$P [X_{s_{2}} = k ∣ τ_{1} = s_{1}, X_{0} = i, τ_{2} = s_{2}, X_{s_{1}} = j] = p^{★} (s_{1} + s_{2}, j, k),$

and, consequently, we have $X_{t} = j$ for $τ_{1} \leq t < τ_{2}$ .
5.: We proceed inductively to define the stopping time $τ_{3}$ and so on.

The following result—compare with Theorem 3—ensures that the preceding construction yields a nonhomogeneous Markov chain with the given intensity matrix.

Theorem 7

(A continuous time Markov chain with a prescribed intensity matrix function). Let the intensity matrix be norm-bounded by a Lebesgue integrable function in

[0, T]

. Then, given the times

{(τ_{0})}_{n \geq 1}

, we have that with the sequence

{(Y_{n})}_{n \geq 1}

defined by

Y_{n} = X_{τ_{n}}

, the process defined by:

X_{t} = \sum_{n = 0}^{+ \infty} Y_{n} 1 I_{[τ_{n}, τ_{n + 1} [} (t) = \sum_{n = 0}^{+ \infty} X_{τ_{n}} 1 I_{[τ_{n}, τ_{n + 1} [} (t)

(21)

is a continuous time Markov chain with transition probability matrix

P

and transition intensity matrix

Q

.

Proof.

For a proof of this result in the general case of Markov continuous time Markov processes, see [26] (pp. 221–233). □

We now proceed to detail a methodology to identify the parameter

θ \in Θ

inspired by the constructive definition of the Markov chain in Definition 5. We assume that we have at our disposal a sample of trajectories of the continuous time nonhomogeneous Markov chain. See Figure 1 taken from [52] as an example, that is schematically a large number of lines—each line is a trajectory—each line giving, by this order, the initial state, the time until the first jump, the second attained state, the time until the second jump, the third attained state, and so on, until eventually the trajectory ends in the absorbing or trap state.

(i): Given a state i, we have to find a fitting for the distribution of the random times of $i \to i$ jumps. According to Formula (19), these times have an exponential distribution with density $μ_{i i}^{θ} (t)$ .
(ii): For every other state j, by using $P [X_{s_{1}} = j ∣ τ_{1} = s_{1}, X_{0} = i]$ , possibly with an approximation, by Formula (20), we can obtain an approximation of $p^{★} (s_{1}, i, j)$ .
(iii): By using Formula (18) and the approximation obtained for $p^{★} (s_{1}, i, j)$ we can obtain an approximation for $μ_{i j}^{θ} (t)$ .
(iv): Finally, we fit an intensity of a chosen functional form to $μ_{i j}^{θ} (t)$ .

We now describe in detail the procedure for applying the methodology just described.

Recall that an observed trajectory has the following structure: (first state, time spent in state, second state, time spent in state, third state…). In a finite set of trajectories we have as real data, under the hypothesis of an unbounded time horizon for the observations of the trajectories, the maximum length for all trajectories is finite. We select all the trajectories of length greater than 3 that start at state $i = 1$ . If the next state is also $i = 1$ , the time spent in state—in this case, in state $i = 1$ —is the first part of the sample for obtaining $μ_{11}^{θ} (t)$ . Select all the trajectories of length greater than 5 for which the second state is $i = 1$ ; this set of trajectories already contains the previously considered set of trajectories, and so, if the third state is also $i = 1$ , the sum of the times spent in the first state and the time spent in the second state will be the second part of the sample for obtaining $μ_{i i}^{θ} (t)$ . Repeat, successively, the procedure for all trajectories of length greater than 7, then of length greater than 9, of length greater than 11, until the maximum length of all the trajectories is attained to obtain the full sample for the intensity $μ_{11}^{θ} (t)$ .
Fit a smooth kernel distribution to the sample obtained for the intensity $μ_{11}^{θ} (t)$ .
Repeat the procedure used for getting the sample for $μ_{11}^{θ} (t)$ , but this time selecting the transitions $1 \to 2$ , that is, the transitions from state $i = 1$ to state $i = 2$ . Fit a smooth kernel distribution to this data.
Now, we look for an estimate of $p^{★} (t, i, j)$ given by Formulas (18) and (20). For that, we will consider rounding the sojourn times—say to unity, in order to have enough observations—and then group all the observations of jumps from the first state according to this rounding. Consider then the observations towards state $i = 2$ . We will then have that:

$\begin{matrix} p^{★} (s_{1}, 1, 2) & = P [X_{s_{1}} = 2 τ_{1} = s_{1}, X_{0} = 1] \approx \\ \approx \frac{P [X_{s_{1}} = 2, s_{1} - 0.5 \leq τ_{1} < s_{1} + 0.5]}{P [s_{1} - 0.5 \leq τ_{1} < s_{1} + 0.5]} = \\ = \frac{P [s_{1} - 0.5 \leq τ_{1} < s_{1} + 0.5 X_{s_{1}} = 2] \cdot P [X_{s_{1}} = 2]}{P [s_{1} - 0.5 \leq τ_{1} < s_{1} + 0.5]} \end{matrix}$

(22)

and the most left-hand side of the formula will be estimable with the observations by using the smooth kernel distributions.
Resorting to Formula (18), we can compute values for $μ_{12}^{θ} (t)$ and fit a piecewise linear density. That is, using again Formula (18), since $μ_{11}^{θ} (s_{1}) \neq 0$ , we have that for an arbitrary time $t = s_{1}$ :

$μ_{12}^{θ} (s_{1}) \approx \frac{- μ_{11}^{θ} (s_{1})}{1 - δ_{1}^{2}} p^{★} (s_{1}, 1, 2) = - μ_{11}^{θ} (s_{1}) \cdot p^{★} (s_{1}, 1, 2) .$
If we consider a set of values of $μ_{12}^{θ} (s_{1}), μ_{12}^{θ} (s_{2}), \dots, μ_{12}^{θ} (s_{k})$ and then fit the multidimensional parameter $θ \in Θ$ to these values, using for that purpose a functional form previously selected.
These procedures are to be repeated in order to obtain the parameters $μ_{2 j}^{θ}$ for $j \neq 2$ and $μ_{3 j}^{θ}$ for $j \neq 3$ .
The intensities $μ_{j j}^{θ}$ for $j = 1, 2, 3$ are obtained in the usual way as a sum of the remaining intensities in line with index i of the intensity matrix.

Remark 5

(On several open problems related to the estimation methodology proposed). The methodology presented in this section provided results that can be appreciated in detail in [52]. More precisely, we generated data with intensities given by the Gompertz–Makeham functional form, and then, using the methodology just described above, we fitted piecewise linear intensities to the data; next, we computed the errors given by the

L^{1}

distance between the integrated transition probabilities from the Gompertz–Makeham functional form and the transition probabilities from the fitted piecewise linear intensities, and we observed that the average error per year is always less than 1%, although with the possibility of one order of magnitude larger errors at some specific years.

We must stress that similarly to what was stated in Remark 4 about the calibration procedure, there are open problems with the estimation procedure stated above and, moreover, the problems are subtle in nature due to the fact that the intensities to be estimated—to wit,

μ_{i j}^{θ} (t)

—are functions we are dealing with for estimation in infinite dimensional spaces. The estimation problem may be reduced to a finite dimensional estimation problem as soon as some functional form—for instance, continuous piecewise linear functions—is assumed for the intensities, because then, there will remain only a finite number of degrees of freedom for each intensity—corresponding to the slopes of the linear pieces and the initial value. Under this perspective, the result in Theorem 5 shows that, in fact, the choice of a functional form for the intensities is somehow indifferent, since if two intensities matrices are close to one another, then the corresponding probability transition matrices will be close to one another. A relevant open problem is, then, to determine the most effective functional form for the intensities given an estimation procedure.

5. Comments on Some Additional References

Let us now review some other approaches to procedures of estimation for both homogeneous and nonhomogeneous Markov chains.

In [65], using the formalism of product integration (see [32,66] for expositions of the concept and [67] for an application to the integrated probability transition functions with fixed points of discontinuity), the authors propose an estimator for the transition probabilities of a nonhomogeneous Markov chain with a finite number of states in the presence of censoring, in particular, when the processes are only observed part of the time. As a first step, an estimator for the integrated transition intensities is obtained using a multivariate counting process, and, in a second step, the product integration is used to obtain the desired estimator for the transition probabilities having useful asymptotic properties. An application of this methodology, using firstly kernel smoothing to estimate the transition intensities, is presented in [68]. An application of the product limit estimator to credit migration matrices is given in [4].
The work [69] is an extension of the previously referred to Aallen’s work for which the state space has an arbitrary but finite number of both transient and absorbing states. A non-parametric product limit estimator is introduced and shown to be uniformly consistent.
The Ph.D. thesis [70] develops a method for estimating the parameters of a nonhomogeneous continuous time Markov chain discretely observed by Poisson sampling by using the Dynkin martingale. The estimators are proven to be strongly consistent and asymptotically normal. A simulation study evaluates the performance of the model against the maximum likelihood estimators with continuously observed trajectories.
The authors of [71] propose a kind of moment estimation procedure to estimate the parameters in the infinitesimal generator of a time-homogeneous continuous-time Markov process: a type of process that may serve as a model of, for instance, high frequency financial data. Several properties of the estimation procedure are proved, such as strong consistency, asymptotic normality, and a characterisation of standard errors.
The article [72] also develops a calibration procedure by seeking, by means of solving a quadratic objective function minimisation problem subjected to linear constraints, a generator—that is an intensity transition matrix—of a continuous time Markov chain, having a probability transition matrix with a spectrum as close as possible to the spectrum of an estimated discrete time Markov chain transition probability matrix. In the case the chain is embeddable, the procedure returns the generator; if not, the procedure returns a best approximation of the generator, in the sense of the minimisation problem solved. Besides testing the procedure with synthetic data, there is an application to data coming from a time series generated from a model of large scale atmospheric flow.
For the content of the work [73], the authors say: “In the present work, we focus on Bayesian inference for the partially observed CTMC without using latent variables; we use the likelihood function directly by evaluation of matrix exponentials and perform posterior inference via a Metropolis–Hastings approach, where the generator matrices are fully specified and not constrained”.
The work [74] is a broad reference on nonhomogeneous Markov chains mainly devoted to the asymptotic behaviour of these chains encompassing some more focused works, such as [75].

In Table 1, we present a comparison summary of the main characteristics of some of the works in statistical estimation, and or, calibration discussed in the text.

6. Conclusions

In this work, we provide a brief review of several important techniques used to fit Markov chain models to phenomena that comprise an evolution of a system taking values in a finite state space; special emphasis was placed on the nonhomogeneous Markov chains models. This review is intended to provide a starting ground for further much needed investigations on the subject. It is manifest that the subject of statistical estimation for nonhomogeneous Markov chains is bound to continue to receive widespread attention due to the range of domains where it finds modelling applications. It is also clear—from this work, in particular—that the methods and procedures designed for the estimation or calibration of parameters of a Markov chain must suit the type of data available; we should highlight two main types of data: either a whole trajectory observed continuously and/or on a special sequence of times, or, in the case of chains having an absorbing state, a sufficiently large set of trajectories of the chain. It is to be expected that the intensive use of computational tools will allow an even wider development of estimation and/or calibration methods.

Author Contributions

Conceptualisation, M.L.E. and N.P.K.; methodology, M.L.E.; software, M.L.E.; validation, M.L.E. and N.P.K.; formal analysis, M.L.E. and N.P.K.; investigation, M.L.E. and N.P.K.; resources, M.L.E.; data curation, M.L.E.; writing—original draft preparation, M.L.E.; writing—review and editing, M.L.E. and N.P.K.; visualisation, M.L.E. and N.P.K.; supervision, M.L.E.; project administration, M.L.E.; funding acquisition, M.L.E. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Not applicable.

Acknowledgments

This work was partially supported by national ressources through the FCT–Fundação para a Ciência e a Tecnologia, I.P., under the scope of the projects UIDB/00297/2020 (https://doi.org/10.54499/UIDB/00297/2020, accessed on Monday 27 February 2025) and UIDP/00297/2020 (https://doi.org/10.54499/UIDP/00297/2020, accessed on Monday 27 February 2025) (Centre for Mathematics and Applications, University Nova de Lisboa).

Conflicts of Interest

The authors declare no conflicts of interest.

References

Christiansen, M.C. Multistate models in health insurance. AStA Adv. Stat. Anal. 2012, 96, 155–186. [Google Scholar] [CrossRef]
Gill, R.D. On Estimating Transition Intensities of a Markov Process with Aggregate Data of a Certain Type: “Occurrences but No Exposures”. Scand. J. Stat. 1986, 13, 113–134. [Google Scholar]
Israel, R.B.; Rosenthal, J.S.; Wei, J.Z. Finding Generators for Markov Chains via Empirical Transition Matrices, with Applications to Credit Ratings. Math. Financ. 2001, 11, 245–265. [Google Scholar] [CrossRef]
Lando, D.; Skødeberg, T.M. Analyzing rating transitions and rating drift with continuous observations. J. Bank. Financ. 2002, 26, 423–444. [Google Scholar] [CrossRef]
Jafry, Y.; Schuermann, T. Measurement, estimation and comparison of credit migration matrices. J. Bank. Financ. 2004, 28, 2603–2639. [Google Scholar] [CrossRef]
Inamura, Y. Estimating Continuous Time Transition Matrices from Discretely Observed Data; Bank of Japan Working Paper Series 06-E-7; Bank of Japan: Tokyo, Japan, 2006. [Google Scholar]
Möstel, L.; Pfeuffer, M.; Fischer, M. Statistical inference for Markov chains with applications to credit risk. Comput. Statist. 2020, 35, 1659–1684. [Google Scholar] [CrossRef]
Bartholomew, D.J. Stochastic Models for Social Processes, 3rd ed.; Wiley Series in Probability and Mathematical Statistics; John Wiley & Sons, Ltd.: Chichester, UK, 1982; pp. xii+365. [Google Scholar]
Vassiliou, P.C.; Georgiou, A.C. Markov and Semi-Markov Chains, Processes, Systems, and Emerging Related Fields. Mathematics 2021, 9, 2490. [Google Scholar] [CrossRef]
Iosifescu, M.; Tăutu, P. Stochastic Processes and Applications in Biology and Medicine. II: Models; Biomathematics, Editura Academiei, Bucharest; Springer: Berlin, Germany, 1973; Volume 4, p. 337. [Google Scholar]
Garg, L.; McClean, S.; Meenan, B.; Millard, P. Non-homogeneous Markov models for sequential pattern mining of healthcare data. IMA J. Manag. Math. 2009, 20, 327–344. [Google Scholar] [CrossRef]
Wan, L.; Lou, W.; Abner, E.; Kryscio, R.J. A comparison of time-homogeneous Markov chain and Markov process multi-state models. Commun. Stat. Case Stud. Data Anal. Appl. 2016, 2, 92–100. [Google Scholar] [CrossRef]
Chang, J.; Chan, H.K.; Lin, J.; Chan, W. Non-homogeneous continuous-time Markov chain with covariates: Applications to ambulatory hypertension monitoring. Stat. Med. 2023, 42, 1965–1980. [Google Scholar] [CrossRef]
Norris, J.R. Markov Chains; Cambridge Series in Statistical and Probabilistic Mathematics; Cambridge University Press: Cambridge, UK, 1998; Volume 2, pp. xvi+237. [Google Scholar]
Resnick, S. Adventures in Stochastic Processes; Birkhäuser Boston, Inc.: Boston, MA, USA, 1992; pp. xii+626. [Google Scholar]
Suhov, Y.; Kelbert, M. Markov chains: A primer in random processes and their applications. In Probability and Statistics by Example. II; Cambridge University Press: Cambridge, UK, 2008; pp. x+487. [Google Scholar] [CrossRef]
Liggett, T.M. Continuous Time Markov Processes: An Introduction; Graduate Studies in Mathematics; American Mathematical Society: Providence, RI, USA, 2010; Volume 113, pp. xii+271. [Google Scholar] [CrossRef]
Levin, D.A.; Peres, Y. Markov Chains and Mixing Times, 2nd ed.; American Mathematical Society: Providence, RI, USA, 2017; pp. xvi+447. [Google Scholar] [CrossRef]
Feller, W. An Introduction to Probability Theory and Its Applications. Vol. I, 3rd ed.; John Wiley & Sons, Inc.: New York, NY, USA, 1968; pp. xviii+509. [Google Scholar]
Feller, W. An Introduction to Probability Theory and Its Applications. Vol. II, 2nd ed.; John Wiley & Sons, Inc.: New York, NY, USA, 1971; pp. xxiv+669. [Google Scholar]
Dynkin, E.B. Theory of Markov Processes; Brown, D.E., Köváry, T., Eds.; Dover Publications, Inc.: Mineola, NY, USA, 2006; pp. xii+210. [Google Scholar]
Chung, K.L. Markov Chains with Stationary Transition Probabilities, 2nd ed.; Die Grundlehren der Mathematischen Wissenschaften; Springer, Inc.: New York, NY, USA, 1967; Volume 104, pp. xi+301. [Google Scholar]
Freedman, D. Markov Chains; Springer: New York, NY, USA, 1983; pp. xiv+382. [Google Scholar]
Stroock, D.W. An Introduction to Markov Processes; Graduate Texts in Mathematics; Springer: Berlin, Germany, 2005; Volume 230, pp. xiv+171. [Google Scholar]
Kallenberg, O. Foundations of Modern Probability, 3rd ed.; Probability Theory and Stochastic Modelling; Springer: Cham, Switzerland, 2021; Volume 99, pp. xii+946. [Google Scholar] [CrossRef]
Iosifescu, M.; Tăutu, P. Stochastic Processes and Applications in Biology and Medicine. I: Theory; Biomathematics; Editura Academiei RSR: Bucharest, Romania; Springer: Berlin, Germany; New York, NY, USA, 1973; Volume 3, p. 331. [Google Scholar]
Iosifescu, M. Finite Markov Processes and Their Applications; Wiley Series in Probability and Mathematical Statistics; John Wiley & Sons, Ltd.: Chichester, UK; Editura Tehnică: Bucharest, Romania, 1980; p. 295. [Google Scholar]
Rolski, T.; Schmidli, H.; Schmidt, V.; Teugels, J. Stochastic Processes for Insurance and Finance; Wiley Series in Probability and Statistics; John Wiley & Sons Ltd.: Chichester, UK, 1999; pp. xviii+654. [Google Scholar] [CrossRef]
Feinberg, E.A.; Mandava, M.; Shiryaev, A.N. On solutions of Kolmogorov’s equations for nonhomogeneous jump Markov processes. J. Math. Anal. Appl. 2014, 411, 261–270. [Google Scholar] [CrossRef]
Feinberg, E.A.; Shiryaev, A.N. Kolmogorov’s equations for jump Markov processes and their applications to control problems. Theory Probab. Appl. 2022, 66, 582–600. [Google Scholar] [CrossRef]
Feinberg, E.; Mandava, M.; Shiryaev, A.N. Kolmogorov’s equations for jump Markov processes with unbounded jump rates. Ann. Oper. Res. 2022, 317, 587–604. [Google Scholar] [CrossRef]
Slavík, A. Product Integration, Its History and Applications; Nečas Center for Mathematical Modeling; Matfyzpress: Prague, Czech Republic, 2007; Volume 1, pp. iv+147. [Google Scholar]
Billingsley, P. Statistical Inference for Markov Processes; Statistical Research Monographs; University of Chicago Press: Chicago, IL, USA, 1961; Volume II, pp. vii+75. [Google Scholar]
Doob, J.L. Stochastic Processes; John Wiley & Sons, Inc.: New York, NY, USA, 1953; pp. viii+654. [Google Scholar]
Kingman, J.F.C. The imbedding problem for finite Markov chains. Z. Wahrscheinlichkeitstheorie Und Verw. Geb. 1962, 1, 14–24. [Google Scholar] [CrossRef]
Johansen, S. Some results on the imbedding problem for finite Markov chains. J. Lond. Math. Soc. 1974, 8, 345–351. [Google Scholar] [CrossRef]
Lencastre, P.; Raischel, F.; Rogers, T.; Lind, P.G. From empirical data to time-inhomogeneous continuous Markov processes. Phys. Rev. E 2016, 93, 032135. [Google Scholar] [CrossRef]
Böttcher, B. Embedded Markov chain approximations in Skorokhod topologies. Probab. Math. Statist. 2019, 39, 259–277. [Google Scholar] [CrossRef]
Billingsley, P. Statistical methods in Markov chains. Ann. Math. Statist. 1961, 32, 12–40. [Google Scholar] [CrossRef]
Prakasa Rao, B.L.S. Maximum likelihood estimation for Markov processes. Ann. Inst. Statist. Math. 1972, 24, 333–345. [Google Scholar] [CrossRef]
Prakasa Rao, B.L.S. Moderate deviation principle for maximum likelihood estimator for Markov processes. Statist. Probab. Lett. 2018, 132, 74–82. [Google Scholar] [CrossRef]
Foutz, R.V.; Srivastava, R.C. Statistical inference for Markov processes when the model is incorrect. Adv. Appl. Probab. 1979, 11, 737–749. [Google Scholar] [CrossRef]
Bladt, M.; Sørensen, M. Statistical inference for discretely observed Markov jump processes. J. R. Stat. Soc. Ser. B Stat. Methodol. 2005, 67, 395–410. [Google Scholar] [CrossRef]
Mitrophanov, A.Y. Stability and exponential convergence of continuous-time Markov chains. J. Appl. Probab. 2003, 40, 970–979. [Google Scholar] [CrossRef]
Mitrophanov, A.Y. The spectral gap and perturbation bounds for reversible continuous-time Markov chains. J. Appl. Probab. 2004, 41, 1219–1222. [Google Scholar] [CrossRef]
Mitrofanov, A.Y. Stability estimates for continuous-time finite homogeneous Markov chains. Teor. Veroyatn. Primen. 2005, 50, 371–379. [Google Scholar] [CrossRef]
Mitrophanov, A.Y. Ergodicity coefficient and perturbation bounds for continuous-time Markov chains. Math. Inequal. Appl. 2005, 8, 159–168. [Google Scholar] [CrossRef]
Waters, H.R. An approach to the study of multiple state models. J. Inst. Actuar. 1984, 111, 363–374. [Google Scholar] [CrossRef]
Haberman, S.; Pitacco, E. Actuarial Models for Disability Insurance; Chapman & Hall/CRC: Boca Raton, FL, USA, 1999; pp. xx+280. [Google Scholar]
Olivieri, A.; Pitacco, E. Introduction to Insurance Mathematics, 2nd ed.; European Actuarial Academy (EAA) Series; Springer: Cham, Switzerland, 2015; pp. xviii+508. [Google Scholar] [CrossRef]
Mitrophanov, A.Y. The Arsenal of Perturbation Bounds for Finite Continuous-Time Markov Chains: A Perspective. Mathematics 2024, 12, 1608. [Google Scholar] [CrossRef]
Esquível, M.L.; Krasii, N.P.; Guerreiro, G.R. Estimation–Calibration of Continuous-Time Non-Homogeneous Markov Chains with Finite State Space. Mathematics 2024, 12, 668. [Google Scholar] [CrossRef]
Zhao, M.d.; Shi, Z.y.; Yang, W.g.; Wang, B. A class of strong deviation theorems for the sequence of real valued random variables with respect to continuous-state non-homogeneous Markov chains. Comm. Statist. Theory Methods 2021, 50, 5475–5487. [Google Scholar] [CrossRef]
Esquível, M.L.; Guerreiro, G.R.; Oliveira, M.C.; Corte Real, P. Calibration of Transition Intensities for a Multistate Model: Application to Long-Term Care. Risks 2021, 9, 37. [Google Scholar] [CrossRef]
Esquível, M.L.; Krasii, N.P.; Guerreiro, G.R. Open Markov Type Population Models: From Discrete to Continuous Time. Mathematics 2021, 9, 1496. [Google Scholar] [CrossRef]
Dickson, D.C.M.; Hardy, M.R.; Waters, H.R. Actuarial Mathematics for Life Contingent Risks, 3rd ed.; International Series on Actuarial Science; Cambridge University Press: Cambridge, UK, 2020; pp. xxiv+759. [Google Scholar] [CrossRef]
Wolthuis, H. Life Insurance Mathematics (The Markovian Model); CAIRE: Brussels, Belgium, 1994; pp. xi+288. [Google Scholar]
Forfar, D.O.; McCutcheon, J.J.; Wilkie, A.D. On graduation by mathematical formula. J. Inst. Actuar. 1988, 115, 1–149. [Google Scholar] [CrossRef]
Haberman, S.; Renshaw, A.E. Generalized Linear Models and Actuarial Science. J. R. Stat. Society. Ser. D (The Stat.) 1996, 45, 407–436. [Google Scholar]
Renshaw, A.; Haberman, S. On the graduations associated with a multiple state model for permanent health insurance. Insur. Math. Econ. 1995, 17, 1–17. [Google Scholar] [CrossRef]
Fong, J.H.; Shao, A.W.; Sherris, M. Multistate actuarial models of functional disability. N. Am. Actuar. J. 2015, 19, 41–59. [Google Scholar] [CrossRef]
de Mol van Otterloo, S.; Alonso-García, J. A multi-state model for sick leave and its impact on partial early retirement incentives: The case of the Netherlands. Scand. Actuar. J. 2023, 2023, 244–268. [Google Scholar] [CrossRef]
Naka, P.; Boado-Penas, M.d.C.; Lanot, G. A multiple state model for the working-age disabled population using cross-sectional data. Scand. Actuar. J. 2020, 2020, 700–717. [Google Scholar] [CrossRef]
Brémaud, P. Markov Chains—Gibbs Fields, Monte Carlo Simulation and Queues, 2nd ed.; Texts in Applied Mathematics; Springer: Cham, Switzerland, 2020; Volume 31, pp. xvi+557. [Google Scholar] [CrossRef]
Aalen, O.O.; Johansen, S. An empirical transition matrix for non-homogeneous Markov chains based on censored observations. Scand. J. Statist. 1978, 5, 141–150. [Google Scholar]
Friedman, C.N. Product integration and solution of ordinary differential equations. J. Math. Anal. Appl. 1984, 102, 509–518. [Google Scholar] [CrossRef]
Johansen, S. The product limit estimator as maximum likelihood estimator. Scand. J. Stat. 1978, 5, 195–199. [Google Scholar]
Keiding, N.; Andersen, P.K. Nonparametric Estimation of Transition Intensities and Transition Probabilities: A Case Study of a Two-State Markov Process. J. R. Stat. Society. Ser. C (Appl. Stat.) 1989, 38, 319–329. [Google Scholar] [CrossRef]
Fleming, T.R.; Harrington, D.P. Estimation for discrete time nonhomogeneous Markov chains. Stoch. Process. Appl. 1978, 7, 131–139. [Google Scholar] [CrossRef]
Cramer, R.D. Parameter Estimation for Discretely Observed Continuous-Time Markov Chains. Ph.D. Thesis, Rice University, Houston, TX, USA, 2001. [Google Scholar]
Duffie, D.; Glynn, P. Estimation of continuous-time Markov processes sampled at random time intervals. Econometrica 2004, 72, 1773–1808. [Google Scholar] [CrossRef]
Crommelin, D.; Vanden-Eijnden, E. Fitting timeseries by continuous-time Markov chains: A quadratic programming approach. J. Comput. Phys. 2006, 217, 782–805. [Google Scholar] [CrossRef]
Riva-Palacio, A.; Mena, R.H.; Walker, S.G. On the estimation of partially observed continuous-time Markov chains. Comput. Statist. 2023, 38, 1357–1389. [Google Scholar] [CrossRef]
Vassiliou, P.C.G. Non-Homogeneous Markov Chains and Systems—Theory and Applications; CRC Press: Boca Raton, FL, USA, 2023; pp. xxi+450. [Google Scholar] [CrossRef]
Vassiliou, P.C.G. Laws of large numbers for non-homogeneous Markov systems with arbitrary transition probability matrices. J. Stat. Theory Pract. 2022, 16, 18. [Google Scholar] [CrossRef]

Figure 1. An example of a set of trajectories of a four-state continuous time Markov chain.

Table 1. Summary table of the main characteristics of some of the cited works.

H or nH ^(a)	Articles	F/iF Dt/Ct ^(b)	TrI/Pr ^(c)	Est/Cal ^(d)	App ^(e)	Cmp/Sim ^(f)	Data ^(g)
nH	[65]	F & Ct	TrI	Cst & As	–	–	FtjCo
	[67]	F & Ct	TrI	MLE	–	–	FtjCo
	[69]	F & Dt	TrI	MLE & Cst & As	–	Cmp	TrjsDo
	[68]	F & Ct	TrI & Pr	K & K	H	Cmp	FtjCo
	[70]	iF	TrI	PSam & Cst & As	–	Sim	TrjsDo
	[55]	F & Ct	TrI	Cal	LTC	–	–
	[75]	F & Ct	TrI	MLE	H	–	TrjsDo
	[52]	F & Ct	TrI	nP & Cal	–	Sim	TrjsDo
H	[39]	F & iF & Dt & Ct	TrI & Pr	MLE & Cst & As	–	–	–
	[33]	F & iF & Dt & Ct	TrI & Pr	MLE & Cst & As	–	–	–
	[40]	iF & Dt	Pr	MLE & Cst & As	–	–	TrjsDo
	[42]	iF & Dt	Pr	MLE & Cst & As	–	–	TrjsDo
	[71]	F& iF& Ct & Dt	TrI & Df	ME & Cst & As	–	–	SmpDo
	[73]	F & Ct	TrI	MLE & Bay	–	Sim	TrjsDo
	[41]	iF & Dt	Pr	MLE & MD	–	–	TrjsDo

^(a) H—homogeneous Markov chain; nH—nonhomogeneous Markov chain. ^(b) F—finite state space; iF—infinite state space; Dt—discrete time; Ct—continuous time. ^(c) TrI—estimation of transition intensities, i.e., generator; Pr—estimation of probability densities; Df—estimation of parameters of a diffusion. ^(d) Est—maximum likelihood estimation (MLE), moment estimators (ME), statistical inference on estimators: consistency (Cst), asymptotic properties (As), kernel estimation (K), other non-parametric (nP), Poisson sampling (PSam), moment estimators (MEs); Bay—Bayesian inference; moderate deviations (MD); Cal—calibration of intensities. ^(e) App: with applications, to credit (Crd), to high frequency data (HFD), long-term care (LTC); other health (H). ^(f) Cmp—Comparison studies with other methods; Sim—simulation studies to assess methods. ^(g) Data type: TrjsDo—set of trajectories discretely observed; FtjCo—set of full trajectories observed continuously; SmpDo—set of trajectories sampled at random times.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Esquível, M.L.; Krasii, N.P. Statistics for Continuous Time Markov Chains, a Short Review. Axioms 2025, 14, 283. https://doi.org/10.3390/axioms14040283

AMA Style

Esquível ML, Krasii NP. Statistics for Continuous Time Markov Chains, a Short Review. Axioms. 2025; 14(4):283. https://doi.org/10.3390/axioms14040283

Chicago/Turabian Style

Esquível, Manuel L., and Nadezhda P. Krasii. 2025. "Statistics for Continuous Time Markov Chains, a Short Review" Axioms 14, no. 4: 283. https://doi.org/10.3390/axioms14040283

APA Style

Esquível, M. L., & Krasii, N. P. (2025). Statistics for Continuous Time Markov Chains, a Short Review. Axioms, 14(4), 283. https://doi.org/10.3390/axioms14040283

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Statistics for Continuous Time Markov Chains, a Short Review

Abstract

1. Introduction

2. Definitions, Notations and Results on Continuous Time Markov Chains

3. Homogeneous Markov Chains

4. Nonhomogeneous Markov Chains

4.1. Calibration of Intensities of a Nonhomogeneous Markov Chain

4.2. Estimation of Nonhomogeneous Markov Chains

5. Comments on Some Additional References

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI