**1. Introduction**

Estimating the characteristics of models is a very popular and, at the same time, important problem of science. This problem arises in applications with unknown parameters, which have to be estimated somehow using real data sets. In particular, such problems have turned out to be fundamental in machine learning procedures [1–5]. The core of these procedures is a parametrized model trained by statistically estimating the unknown parameters based on real data. Most of the econometric problems associated with reconstructing functional relations and forecasting also reduce to estimating the model parameters; for example, see [6,7].

The problems described above are solved using traditional mathematical statistics methods, such as the maximum likelihood method and its derivatives, the method of moments, Bayesian methods, and their numerous modifications [8,9].

Among the mathematical tools for parametric estimation mentioned, a special place is occupied by entropy maximization methods for finite-dimensional probability distributions [10,11].

Consider a random variable *x* taking discrete values *x*1, ... , *xn* with probabilities *p*1, ... , *pn*, respectively, and *r* functions *f*1(*x*), ... , *fr*(*x*) of this variable with discrete values. The discrete probability distribution function **p**(*x*) = {*p*1(*x*1), ... , *pn*(*xn*)} is defined as the solution of the problem

$$H(\mathbf{p}) = -\sum\_{i=1}^{n} p\_i \ln p\_{i\prime} \quad \sum\_{i=1}^{n} p\_i f\_k(\boldsymbol{x}\_i) \le q\_{k\prime} \quad k = 1, \dots, r, \boldsymbol{\omega}$$

**Citation:** Popkov, Y.S. Qualitative Properties of Randomized Maximum Entropy Estimates of Probability Density Functions. *Mathematics* **2021**, *9*, 548. https://doi.org/10.3390/ math9050548

Academic Editors: Mikhail Posypkin, Andrey Gorshenin and Vladimir Titarev

Received: 28 January 2021 Accepted: 2 March 2021 Published: 5 March 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

where *q*1,..., *qr* are given constants.

If *fk*(*xi*) <sup>≡</sup> *<sup>x</sup><sup>k</sup> <sup>i</sup>* , then the system of equalities specifies constraints on the *k*th moments of the discrete random variable *x*. In the case of equality constraints, some modifications of this problem adapted to different applications were studied in [10–13]. Since this problem is conditionally extremal, it can be solved using the Lagrange method, which leads to a system of equations for Lagrange multipliers. The latter often turn out to be substantially nonlinear functions, and hence, rather sophisticated techniques are needed for their numerical calculation [14,15].

In the case of inequality constraints, this problem belongs to the class of mathematical programming problems [16].

The entropy maximization principle is adopted to estimate the parameters of a priori distributions when constructing Bayesian estimates [17,18] or maximum likelihood estimates.

The parameters of probability distributions (continuous or discrete) can be estimated using various mathematical statistics methods, including the method of entropy maximization. Their efficiency in hydrological problems was compared in [19]. Apparently, the method of entropy maximization yields the best results in such problems due to the structure of hydrological data.

The problem of estimating some model characteristics on real data was further developed in connection with the appearance of new machine learning methods, called randomized machine learning (RML) [20]. They are based on models with random parameters, and it is necessary to estimate the probability density functions of these parameters. The estimation algorithm (RML algorithm) is formulated in terms of functional entropylinear programming [21].

The original statement of this problem was to estimate probability density functions (PDFs) in RML procedures. However, in recent times, a more general context has been assumed—the method of maximizing entropy functionals for constructing estimates of continuous probability density functions using real data (randomized maximum entropy (RME) estimation).

In this paper, the general RME estimation problem is formulated; its solutions, numerical algorithms, and the asymptotic properties of the solutions are studied. The theoretical results are illustrated by an important application—estimating the evolution of the thermokarst lake area in Western Siberia.

## **2. Statement of the RME Estimation Problem**

Consider a scalar continuous function *ϕ*(*x*, *θ*) with parameters *θ* = {*θ*1, ... , *θn*}. Assume that this function is a characteristic of an object's model with an input *x* and an output *<sup>y</sup>*ˆ. Let **<sup>x</sup>**(*r*) <sup>=</sup> {*x*[1], ... , *<sup>x</sup>*[*r*]} and **<sup>y</sup>**(*r*) <sup>=</sup> {*y*[1], ... , *<sup>y</sup>*[*r*]} be given measurements at time *t* = 1, ... ,*r*. Note that the latter measurements are obtained with random vector errors *ξ* = {*ξ*[1],..., *ξ*[*r*]}, which are generally different for different time points.

Thus, after *r* measurements, the model and observations are described by the equations

$$\begin{array}{rcl}\cline{ $\mathfrak{Y}$ } &=& \Gamma(\mathsf{x}^{(r)}, \theta) , & & \\\cline{ $\mathfrak{Y}$ } &=& \mathfrak{Y} + \mathfrak{Z} .\end{array} \tag{1}$$

where the vector function Γ(**x**(*r*), *θ*) has the components *ϕ*(*x*[*t*], *θ*), where *t* = 1, ... ,*r* are the time points; **v**ˆ denotes the observed output of the model containing measurement noises of the object's output.

Let us introduce a series of assumptions necessary for further considerations.


• The random noise is *<sup>ξ</sup>* <sup>∈</sup> <sup>Ξ</sup> <sup>⊂</sup> *<sup>R</sup><sup>r</sup>* , where

$$
\Xi = \bigotimes\_{t=1}^{r} \Xi\_{t}, \qquad \Xi\_{t} = [\xi\_{t}^{-}, \xi\_{t}^{+}]. \tag{2}
$$

• The PDF *Q*(*ξ*) of the measurement noises is continuously differentiable on the support Ξ and also has the multiplicative structure

$$Q(\xi) = \prod\_{t=1}^{r} Q\_t(\xi[t]). \tag{3}$$

The estimation problem is stated as follows: Find the estimates of the PDFs *P*∗(*θ*) and *Q*∗(*ξ*) that maximize the generalized information entropy functional

$$\mathcal{H}[P(\theta), Q(\tilde{\boldsymbol{\xi}})] = -\int\_{Q} P(\theta) \ln P(\theta) d\theta - \sum\_{t=1}^{r} \int\_{\Xi\_{t}} Q\_{t}(\tilde{\boldsymbol{\xi}}[t]) \ln Q\_{t}(\tilde{\boldsymbol{\xi}}[t]) \Rightarrow \max \tag{4}$$

subject to

—the normalization conditions of the PDFs given by

$$\int\_{\Theta} P(\theta)d\theta = 1; \quad \int\_{\Xi\_{\ell}} Q\_{t}(\xi[t])d\xi[t] = 1, \quad t = 1, \ldots, r;\tag{5}$$

and

—the empirical balance conditions

$$\begin{array}{rcl}\Phi[P(\theta),Q(\tilde{\boldsymbol{\xi}})] &=& \mathbf{y}^{(r)},\\\Phi[P(\theta),Q(\tilde{\boldsymbol{\xi}})] &=& \{\Phi\_1[P(\theta),Q(\tilde{\boldsymbol{\xi}})],\ldots,\Phi\_r[P(\theta),Q(\tilde{\boldsymbol{\xi}})]\}\\\Phi\_t[P(\theta),Q(\tilde{\boldsymbol{\xi}})] &=& \int\_{\Theta} \boldsymbol{\varrho}(\boldsymbol{x}[t],\boldsymbol{\theta})P(\theta)d\theta + \int\_{\Xi\_t} \boldsymbol{Q}\_t(\tilde{\boldsymbol{\xi}}[t])\tilde{\boldsymbol{\xi}}[t]d\tilde{\boldsymbol{\xi}}[t], \quad t=1,\ldots,r,\end{array} \tag{6}$$

where **<sup>y</sup>**(*r*) <sup>=</sup> {*y*[1], ... , *<sup>y</sup>*[*r*]} are the measured data on the object's output. We will denote the problems (4)–(6) as the RME estimate.

Problems (4)–(6) are of the Lyapunov type [23,24], as they have an integral functional and also integral constraints.

#### **3. Optimality Conditions**

The optimality conditions in optimization problems of the Lyapunov type are formulated in terms of Lagrange multipliers. In addition, the Gâteaux derivatives of the problem's functionals are used [25].

The Lagrange functional is defined by

$$\begin{split} \mathcal{L}[P(\theta), Q(\tilde{\xi}), \mu, \eta, \lambda] &= \, \_t\mathcal{H}[P(\theta), Q(\tilde{\xi})] + \mu \Big(1 - \int\_{\Theta} P(\theta) d\theta \Big) + \\ &+ \sum\_{t=1}^{r} \eta\_t \Big(1 - \int\_{\Xi\_t} Q\_t(\xi[t]) d\xi[t] \Big) + \\ &+ \sum\_{t=1}^{r} \lambda\_t \Big(y[t] - \int\_{\Theta} P(\theta) \rho(\mathbf{x}[t], \theta) d\theta - \int\_{\Xi\_t} Q\_t(\xi[t]) \tilde{\xi}[t] d\xi[t] \Big). \end{split} \tag{7}$$

Let us recall the technique for obtaining optimality conditions in terms of the Gâteaux derivatives [26].

The PDFs *P*(*θ*) and *Qt*(*ξ*[*t*]), (*t* = 1, ... ,*r*), are continuously differentiable, i.e., belong to the class *C*1. Choosing arbitrary functions *h*(*θ*) and *wt*(*ξ*[*t*]), (*t* = 1, ... ,*r*), from this class, we represent the PDFs as

$$P(\theta) = P^\*(\theta) + ah(\theta); \quad Q\_l(\tilde{\xi}[t]) = Q^\*\_l(\tilde{\xi}[t]) + \beta\_l w\_l(\tilde{\xi}[t]), \quad t = 1, \dots, r, \tilde{\xi}$$

where the PDFs *P*∗(*θ*) and *Q*∗ *<sup>t</sup>* (*ξ*[*t*]) are the solutions of problems (4)–(6), and *α* and *β*1,..., *β<sup>r</sup>* are parameters.

Next, we substitute the above representations of the PDFs into (7). If all functions from *C*<sup>1</sup> are assumed to be fixed, the Lagrange functional depends on the parameters *α* and *β*1,..., *βr*. Then, the first-order optimality conditions for the functional (7) in terms of the Gâteaux derivative take the form

$$\left. \frac{d\mathcal{L}}{d\alpha} \right|\_{(a,\beta)=0} = 0, \quad \left. \frac{\partial \mathcal{L}}{\partial \beta\_t} \right|\_{(a,\beta)=0} = 0, \quad t = 1, \dots, r. $$

These conditions lead to the following system of integral equations:

$$\int\_{\Theta} h(\theta) \Omega(\theta) d\theta = 0, \quad \int\_{\Xi\_t} w\_t(\pounds[t]) \Upsilon\_t(\pounds[t]) d\S[t] = 0, \quad t = 1, \dots, r, \quad t$$

which are satisfied for any functions *h*(*θ*) and *w*1(*ξ*[1]), ... , *wr*(*ξ*[*r*]) from *C*<sup>1</sup> if and only if

$$
\Omega(\theta) = 0, \quad \mathbb{Y}\_t(\xi[t]) = 0, \quad t = 1, \dots, r.
$$

The optimality conditions for problems (4)–(6) are given by

$$\Omega(\theta) = \ln P^\*(\theta) + 1 - \mu - \sum\_{t=1}^{s} \lambda\_t \varphi(\mathbf{x}[t], \theta) = 0,\tag{8}$$

$$\mathcal{Y}\_t(\mathfrak{f}[t]) = \ln Q\_t^\*(\mathfrak{f}[t]) + 1 - \eta\_t - \lambda\_t \mathfrak{f}[t] = 0, \quad t = 1, \ldots, r. \tag{9}$$

Hence, the entropy-optimal PDFs of the model parameters and measurement noises have the form

$$P^\*(\boldsymbol{\theta} \mid \mathbf{y}^{(r)}, \mathbf{x}^{(r)}) = \begin{array}{c} \exp\left(-\sum\_{j=1}^r \lambda\_j(\mathbf{y}^{(r)}, \mathbf{x}^{(r)}) \, \boldsymbol{\varrho}(\mathbf{x}[j], \boldsymbol{\theta})\right) \\ \hline \mathcal{P}(\lambda(\mathbf{y}^{(r)}, \mathbf{x}^{(r)}) \end{array},$$

$$Q\_t^\*(\xi[t] \mid \mathbf{y}^{(r)}, \mathbf{x}^{(r)}) = \begin{array}{c} \exp\left(\lambda\_t(\mathbf{y}^{(r)}, \mathbf{x}^{(r)}) \tilde{\xi}[t]\right) \\ \hline \mathcal{Q}\_t(\lambda\_t(\mathbf{y}^{(r)}, \mathbf{x}^{(r)}) \end{array}, \quad t = 1, \ldots, r,\tag{10}$$

where

$$\mathcal{P}(\boldsymbol{\lambda}(\mathbf{y}^{(r)},\mathbf{x}^{(r)}) \quad = \int\_{\Theta} \exp\left(-\sum\_{j=1}^{r} \boldsymbol{\lambda}\_{j}(\mathbf{y}^{(r)},\mathbf{x}^{(r)}) \boldsymbol{\varrho}(\mathbf{x}[j],\theta)\right) d\theta,$$

$$\mathcal{Q}t(\boldsymbol{\lambda}\_{t}(\mathbf{y}^{(r)},\mathbf{x}^{(r)}) \quad = \int\_{\Xi\_{t}} \exp\left(\boldsymbol{\lambda}\_{t}(\mathbf{y}^{(r)},\mathbf{x}^{(r)}) \boldsymbol{\xi}[t]\right) d\xi[t], \quad t = 1,\ldots,r. \tag{11}$$

Due to equalities (10) and (11), the entropy-optimal PDFs are parametrized by the Lagrange multipliers *λ*1, ... , *λr*, which represent the solutions of the empirical balance equations

$$\frac{\mathcal{G}(\lambda(\mathbf{y}^{(r)}, \mathbf{x}^{(r)}))}{\mathcal{P}(\lambda(\mathbf{y}^{(r)}, \mathbf{x}^{(r)}))} + \frac{\mathcal{E}\_t(\lambda\_t(\mathbf{y}^{(r)}, \mathbf{x}^{(r)}))}{\mathcal{Q}\_t(\lambda\_t(\mathbf{y}^{(r)}, \mathbf{x}^{(r)}))} = y[t], \quad t = 1, \ldots, r,\tag{12}$$

where

$$\mathcal{G}(\boldsymbol{\lambda}(\mathbf{y}^{(r)},\mathbf{x}^{(r)})) = \int\_{\Theta} \boldsymbol{\varrho}(\mathbf{x}[t],\boldsymbol{\theta}) \exp\left(-\sum\_{j=1}^{r} \lambda\_{j}(\mathbf{y}^{(r)},\mathbf{x}^{(r)}) \boldsymbol{\varrho}(\mathbf{x}[j],\boldsymbol{\theta})\right) d\boldsymbol{\theta},$$

$$\mathcal{E}\_{t}(\boldsymbol{\lambda}\_{t}(\mathbf{y}^{(r)},\mathbf{x}^{(r)})) = \int\_{\Xi\_{t}} \tilde{\boldsymbol{\varrho}}[t] \, \exp\left(-\lambda\_{t}(\mathbf{y}^{(r)},\mathbf{x}^{(r)}) \tilde{\boldsymbol{\varrho}}[t]\right) d\tilde{\boldsymbol{\varepsilon}}[t], \quad t = 1,\ldots,r. \tag{13}$$

The solution *λ*∗(**y**(*r*), **x**(*r*)) of these equations depends on the sample (**y**(*r*), **x**(*r*)) used for constructing the RME estimates of the PDFs.

## **4. Existence of an Implicit Function**

The second term in the balance Equations (12) and (13) is the mean value of the noise in each measurement *t*. The noises and their characteristics are often assumed to be equal over the measurements:

$$\mathbb{Z}^- \le \mathbb{Z}[t] \le \mathbb{Z}^+, \quad t = 1, \dots, r. \tag{14}$$

Therefore, the mean value of the noise is given by

$$\bar{\xi} = \frac{\mathcal{E}\_t(\lambda\_t(\mathbf{y}^{(r)}, \mathbf{x}^{(r)}))}{\mathcal{Q}\_t(\lambda\_t(\mathbf{y}^{(r)}, \mathbf{x}^{(r)}))}, \quad \xi^- \le \bar{\xi} \le \xi^+. \tag{15}$$

The balance equations can be written as

$$\mathcal{W}\_{t}(\lambda \mid \tilde{y}[t], \mathbf{x}^{(r)}) \quad = \int\_{\Theta} (\boldsymbol{\varrho}(\mathbf{x}[t], \theta) - \tilde{y}[t]) \exp\left(-\sum\_{j=1}^{r} \lambda\_{j} (\tilde{\mathbf{y}}^{(r)}, \mathbf{x}^{(r)}) \boldsymbol{\varrho}(\mathbf{x}[j], \theta)\right) d\theta = 0,\tag{16}$$

$$t \quad = \quad 1, \dots, r,\tag{16}$$

where

$$
\mathfrak{F}[t] = \mathfrak{y}[t] - \mathfrak{f}, \qquad \mathfrak{F}^{(r)} = \{\mathfrak{Y}[1], \dots, \mathfrak{Y}[r]\}. \tag{17}
$$

In the vector form, Equation (16) is described by

$$\mathbf{W}(\lambda \mid \mathbf{\bar{y}}^{(r)}, \mathbf{x}^{(r)}) = \mathbf{0}.\tag{18}$$

Equation (21) defines an implicit function *λ*(**y˜**(*r*), **x**(*r*)). The existence and properties of this implicit function depend on the properties of the Jacobian matrix

$$J\_{\lambda}(\lambda \mid \mathbf{\tilde{y}}^{(r)}, \mathbf{x}^{(r)}) = \left[ \frac{\partial W\_{t}}{\partial \lambda\_{i}} \mid (t, i) = 1, \dots, r \right],\tag{19}$$

which has the elements

$$\frac{\partial \mathcal{W}\_t}{\partial \lambda\_i} = \int\_Q \left( \boldsymbol{\varrho}(\mathbf{x}[t], \boldsymbol{\theta}) - \tilde{\mathbf{y}}[t] \right) \boldsymbol{\varrho}(\mathbf{x}[i], \boldsymbol{\theta}) \sum\_{j=1}^r \exp \left( - \sum\_{j=1}^r \lambda\_j \boldsymbol{\varrho}(\mathbf{x}[j], \boldsymbol{\theta}) \right) d\boldsymbol{\theta}. \tag{20}$$

**Theorem 1.** *Let the next conditions be valid (assume that):*


$$\det J\_{\lambda}(\lambda \mid \bar{\mathbf{y}}^{(r)}, \mathbf{x}^{(r)}) \neq 0,\tag{21}$$

$$\lim\_{||\lambda||\to\infty} \mathbf{W}(\lambda \mid \bar{\mathbf{y}}^{(r)}, \mathbf{x}^{(r)}) = \pm \infty. \tag{22}$$

*Then, there exists a unique implicit function <sup>λ</sup>*(**y**˜(*r*), **<sup>x</sup>**(*r*),) *defined on R<sup>r</sup>* <sup>×</sup> *<sup>R</sup><sup>r</sup> .* **Proof of Theorem 1.** Due to the first assumption, the continuous function **<sup>W</sup>**(*<sup>λ</sup>* <sup>|</sup> **<sup>y</sup>**˜(*r*), **<sup>x</sup>**(*r*)) induces the vector field <sup>Φ</sup>(**y**˜(*r*),**x**(*r*))(*λ*) = **<sup>W</sup>**(*<sup>λ</sup>* <sup>|</sup> **<sup>y</sup>**˜(*r*), **<sup>x</sup>**(*r*)) in the space *<sup>R</sup><sup>r</sup>* <sup>×</sup> *<sup>R</sup><sup>r</sup>* .

We choose an arbitrary vector **u** in *R<sup>r</sup>* and define the vector field

$$\Pi\_{\mathbf{u}}(\lambda) = \Phi\_{(\mathfrak{Y}^{(r)}, \mathbf{x}^{(r)})}(\lambda) - \mathbf{u}.$$

By condition (22), the field Π**u**(*λ*) with a fixed vector **u** has no zeros on the spheres *λ*-= of a sufficiently large radius .

Hence, a rotation is well defined on the spheres *λ*- = of a sufficiently large radius . For details, see [27].

Consider the two vector fields

$$
\Pi\_{\mathbf{u}^{(1)}}(\lambda) = \Phi\_{(\mathfrak{Y}^{(r)}, \mathbf{x}^{(r)})}(\lambda) - \mathbf{u}^{(1)}, \quad \Pi\_{\mathbf{u}^{(2)}}(\lambda) = \Phi\_{(\mathfrak{Y}^{(r)}, \mathbf{x}^{(r)})}(\lambda) - \mathbf{u}^{(2)}.
$$

These vector fields are homotopic on the spheres of a sufficiently large radius, i.e., the field

$$\alpha \Omega(\lambda) = a \Pi\_{\mathbf{u}^{(1)}}(\lambda) + (1 - a) \Pi\_{\mathbf{u}^{(2)}}(\lambda) = \Phi\_{(\mathbf{y}^{(r)}, \mathbf{x}^{(r)})}(\lambda) - [a \mathbf{u}^{(1)} + (1 - a) \mathbf{u}^{(2)}]^\top$$

has no zeros on the spheres of a sufficiently large radius for any *α* ∈ [0, 1]. Homotopic fields have identical rotations [27]:

$$\gamma(\Pi\_{\mathbf{u}^{(1)}}(\lambda)) = \gamma(\Pi\_{\mathbf{u}^{(2)}}(\lambda)).$$

The vector fields Π**u**(1)(*λ*) and Π**u**(2)(*λ*) are nondegenerate on the spheres of a sufficiently large radius; in the ball *λ*- ≤ <sup>1</sup> < , however, each of them may have a number of singular points. We denote by *κ*(**u**(1)) and *κ*(**u**(2)) the numbers of singular points of the vector fields Π**u**(1)(*λ*) and Π**u**(2)(*λ*), respectively. As the vector fields are homotopic,

$$\kappa(\mathbf{u}^{(1)}) = \kappa(\mathbf{u}^{(2)}) = \kappa.$$

In view of (21), these singular points are isolated. Now, let us utilize the index of a singular point introduced in [27]:

$$\text{ind}\,(\lambda^0) = (-1)^{\oint (\lambda^0)}\text{.}$$

where *β*(*λ*0) is the number of eigenvalues of the matrix Π **<sup>u</sup>**(*λ*0) = *<sup>J</sup>λ*(*λ*<sup>0</sup> <sup>|</sup> , **<sup>y</sup>**˜(*r*), **<sup>x</sup>**(*r*)) with the negative real part. By definition, the value of this index depends not on the magnitude of *β*(*λ*0), but on its parity. Due to condition (21), all singular points have the same parity. Really, *<sup>J</sup>λ*(*λ*<sup>0</sup> <sup>|</sup> **<sup>y</sup>**˜(*r*), **<sup>x</sup>**(*r*)) <sup>=</sup> 0, and hence, for any **<sup>y</sup>**˜(*r*), **<sup>x</sup>**(*r*) <sup>∈</sup> *<sup>R</sup><sup>r</sup>* <sup>×</sup> *<sup>R</sup><sup>r</sup>* , the eigenvalues of the matrix *<sup>J</sup>λ*(*λ*<sup>0</sup> <sup>|</sup> **<sup>y</sup>**˜(*r*), **<sup>x</sup>**(*r*)) may move from the left half-plane to the right one in pairs only: Real eigenvalues are transformed into pairs of complex–conjugate ones, passing through the imaginary axis.

In view of this fact, the rotation of the homotopic fields (20) is given by

$$
\gamma(\Pi\_\mathbf{u}) = \kappa(-1)^\beta{}\_\prime
$$

where *β* is the number of eigenvalues of the matrix Π **<sup>u</sup>**(*λ*) for some singular point.

It remains to demonstrate that the vector field Π**u**(*λ*) has a unique singular point in the ball *λ*-≤ <sup>1</sup> < . Consider the equation

$$\Pi\_{\mathbf{u}}(\lambda) = \Phi\_{\left(\mathfrak{Y}^{(r)}, \mathbf{x}^{(r)}\right)}(\lambda) - \mathbf{u} = 0.$$

Assume that for each fixed pair (**y**˜(*r*), **x**(*r*)), this equation has *κ* singular points, i.e., the functions *λ*(1)(**y**˜(*r*), **x**(*r*)), ... , *λ*(*κ*)(**y**˜(*r*), **x**(*r*)). Therefore, it defines a multivalued function *λ*(**y**˜(*r*), **x**(*r*)), whose *κ* branches are isolated (the latter property follows from the isolation of the singular points). Due to condition (21), each of the branches *λ*(*i*)(**y**˜(*r*), **x**(*r*)) defines an open set in the space *R<sup>r</sup>* , and

$$\bigcup\_{i=1}^{\kappa} \lambda^{(i)}(\bar{\mathbf{y}}^{(r)}, \mathbf{x}^{(r)}) = \mathbb{R}^r.$$

This is possible if and only if *<sup>κ</sup>* <sup>=</sup> 1. Hence, for each pair (**y**˜(*r*), **<sup>x</sup>**(*r*)) from *<sup>R</sup><sup>r</sup>* <sup>×</sup> *Rr* , there exists a unique function *<sup>λ</sup>*∗(**y**˜(*r*), **<sup>x</sup>**(*r*)) for which the function **<sup>W</sup>**(*<sup>λ</sup>* <sup>|</sup> **<sup>y</sup>**˜(*r*), **<sup>x</sup>**(*r*)) will vanish.

**Theorem 2.** *Under the assumptions of Theorem 1, the function λ*(**y**˜(*r*), **x**(*r*)) *is real analytical in all variables.*

**Proof of Theorem 2.** From (15), it follows that the function **<sup>W</sup>**(*<sup>λ</sup>* <sup>|</sup> **<sup>y</sup>**˜(*r*), **<sup>x</sup>**(*r*)) is analytical in all variables. Therefore, the left-hand side of Equation (15) can be expanded into the generalized Taylor series [26], and the solution can be constructed in the form of the generalized Taylor series as well. The power elements of this series are determined using a recursive procedure.

#### **5. Asymptotic Efficiency of RME Estimates**

The RME estimate yields the entropy-optimal PDFs (10) for the arrays of input and output data, each of size *r*. For the sake of convenience, consider the PDFs parametrized by the exponential Lagrange multipliers *z* = exp(−*λ*). Then, equalities (10) take the form

$$P^\*\left(\theta, \mathbf{z}(\mathbf{y}^{(r)}, \mathbf{x}^{(r)})\right) \\
= \underbrace{\prod\_{j=1}^r [z\_j(\mathbf{y}^{(r)}, \mathbf{x}^{(r)})]^{\varrho(x[j],\theta)}}\_{\Theta}$$

$$Q^\*\_t(\xi[t], z\_t(\mathbf{y}^{(r)}, \mathbf{x}^{(r)})) \\
= \underbrace{[z\_t(\mathbf{y}^{(r)}, \mathbf{x}^{(r)})]^{\xi[t]}}\_{\Xi\_t}, \quad t = 1, \ldots, r. \tag{23}$$

Consequently, the structure of the PDF significantly depends on the values of the exponential Lagrange multipliers **z**, which, in turn, depend on the data arrays **y**(*r*) and **x**(*r*).

**Definition 1.** *The estimates P*∗(*θ*, **z**∗) *and Q*∗ *<sup>t</sup>* (*ξ*[*t*], *z*<sup>∗</sup> *<sup>t</sup>* ) *are said to be asymptotically efficient if*

$$\begin{aligned} \lim\_{r \to \infty} P^\* \left( \theta, \mathbf{z}(\mathbf{y}^{(r)}, \mathbf{x}^{(r)}) \right) &=& P^\*(\theta, \mathbf{z}^\*), \\ \lim\_{r \to \infty} Q\_t^\* \left( \tilde{\varsigma}[t], z\_t(\mathbf{y}^{(r)}, \mathbf{x}^{(r)}) \right) &=& Q\_t^\*(\tilde{\varsigma}[t], z\_t^\*), \quad t = 1, \ldots, r; \end{aligned} \tag{24}$$

*where*

$$\mathbf{z}^\* = \lim\_{r \to \infty} \mathbf{z}(\mathbf{y}^{(r)}, \mathbf{x}^{(r)}). \tag{25}$$

Consider the empirical balance Equation (21), written in terms of the exponential Lagrange multipliers:

$$\Phi\_t(\mathbf{z}, \mathbf{\bar{y}}^{(r)}, \mathbf{x}^{(r)}) = \int\_{\Theta} \prod\_{j=1}^r [z\_j(\mathbf{\bar{y}}^{(r)}, \mathbf{x}^{(r)})]^{\mathfrak{q}(\mathbf{x}[j], \theta)} (\varrho(\mathbf{x}[t], \theta) - \mathfrak{J}[t]) d\theta = 0, \quad t = 1, \ldots, r. \tag{26}$$

As has been demonstrated above, Equation (26) defines an implicit analytical function **<sup>z</sup>** <sup>=</sup> **<sup>z</sup>**(**y**˜(*r*), **<sup>x</sup>**(*r*)) for (**y**˜(*r*), **<sup>x</sup>**(*r*)) <sup>∈</sup> *<sup>R</sup><sup>r</sup>* <sup>×</sup> *<sup>R</sup><sup>r</sup>* .

Differentiating the left- and right-hand sides of these equations with respect to **y**˜(*r*) and **x**(*r*) yields

$$\frac{\partial \mathbf{z}}{\partial \breve{\mathbf{y}}^{(r)}} = \begin{array}{c} -\left[\frac{\partial \Phi}{\partial \mathbf{z}}\right]^{-1} \frac{\partial \Phi}{\partial \breve{\mathbf{y}}^{(r)}}, \\\\ \frac{\partial \mathbf{z}}{\partial \mathbf{x}^{(r)}} = \left[-\frac{\partial \Phi}{\partial \mathbf{z}}\right]^{-1} \frac{\partial \Phi}{\partial \mathbf{x}^{(r)}}. \end{array} \tag{27}$$

Then, passing to the norms and using the inequality for the norm of the product of matrices [28], we obtain the equalities

$$0 \le \left\| \frac{\partial \mathbf{z}}{\partial \dot{\mathbf{y}}^{(r)}} \right\| \le \inf \left\| \frac{\partial \Phi}{\partial \mathbf{z}} \right\|^{-1} \left\| \left\| \frac{\partial \Phi}{\partial \dot{\mathbf{y}}^{(r)}} \right\| \right\| \tag{28}$$

$$0 \le \left\| \frac{\partial \mathbf{z}}{\partial \mathbf{x}^{(r)}} \right\| \le \left\| \left[ \frac{\partial \Phi}{\partial \mathbf{z}} \right]^{-1} \right\| \left\| \frac{\partial \Phi}{\partial \mathbf{x}^{(r)}} \right\|.$$

Both of the inequalities incorporate the norm of the inverse matrix < < < < 4 *∂*Φ *∂***z** 5−<sup>1</sup> < < < <.

**Lemma 1.** *Let a square matrix A be nonsingular, i.e.,* det *A* = 0. *Then, there exists a constant α* > 1 *such that*

$$\frac{1}{||A||} \le ||A^{-1}|| \le \frac{\alpha}{||A||}. \tag{29}$$

**Proof of Lemma 1.** Since the matrix *A* is nondegenerate, the elements *a* (−1) *ik* of the inverse matrix *A*−<sup>1</sup> can be expressed in terms of the algebraic complement (adjunct) of the element *aki* in the determinant of the matrix *A* [28]:

$$a\_{ik}^{(-1)} = \frac{A\_{ki}}{\det A'} \qquad (k, i) = 1, \dots, r\_r$$

and they are bounded:

$$a\_{ik}^{(-1)} \le M < \infty, \qquad \|A^{-1}\| < \infty.$$

Hence, there exists a constant *α* > 1 for which inequality (29) is satisfied.

Lemma 1 can be applied to the norm < < < < 4 *∂*Φ *∂***z** 5−<sup>1</sup> < < < < of the inverse matrix. As a result,

$$a\left(\left\|\frac{\partial\Phi}{\partial\mathbf{z}}\right\|\right)^{-1} \le \left\|\left[\frac{\partial\Phi}{\partial\mathbf{z}}\right]^{-1}\right\| \le a\left(\left\|\frac{\partial\Phi}{\partial\mathbf{z}}\right\|\right)^{-1},\tag{30}$$

where

< < < < *∂*Φ *∂***z** < < < <sup>&</sup>lt; <sup>=</sup> *<sup>r</sup>* max *t*,*j* # # # # # *∂*Φ*<sup>t</sup> ∂zj* # # # # # . (31)

**Lemma 2.** *Let*

$$\|\frac{\partial\Phi}{\partial\mathbf{\bar{y}}^{(r)}}\| \le \varrho < \infty, \qquad \|\frac{\partial\Phi}{\partial\mathbf{x}^{(r)}}\| \le \omega < \infty. \tag{32}$$

*Then,*

$$\lim\_{r \to \infty} \| \frac{\partial \mathbf{z}}{\partial \mathbf{\tilde{y}}^{(r)}} \| = \lim\_{r \to \infty} \| \frac{\partial \mathbf{z}}{\partial \mathbf{x}^{(r)}} \| = 0. \tag{33}$$

**Proof of Lemma 2.** According to (28), (31), and (32) we have:

$$\|\frac{\partial \mathbf{z}}{\partial \mathbf{\bar{y}}^{(r)}}\| \le \frac{1}{r} \left(\frac{\varrho}{b}\right), \quad \|\frac{\partial \mathbf{z}}{\partial \mathbf{\bar{x}}^{(r)}}\| \le \frac{1}{r} \left(\frac{\omega}{b}\right).$$

where *b* = max*t*,*<sup>j</sup>* # # # *∂*Φ*<sup>t</sup> ∂zj* # # #.

Whence, it follows that for the sample length *r* → ∞, the norms of relevant Jacobians tend to zero, and function **z** = **z**(**y**˜(*r*), **x**(*r*)) tends to the vector **z**<sup>∗</sup> (25).
