*2.2. Necessary Assumptions Concerning Observation Model*

To state the minimax estimation problem (8) properly and guarantee the existence of its solution we have to make additional assumptions concerning the uncertainty of *γ*, the observation model (1) and the estimated vector *h*:


$$\mathfrak{L}(y, F) \not\gg L \tag{9}$$

holds for all *<sup>F</sup>* <sup>∈</sup> <sup>F</sup>*L*. The inequality (9) is called *the conformity constraint of the level <sup>L</sup> based on the likelihood function* (or, shortly, likelihood constraint).


$$\min\_{\{q,\varkappa\} \in \mathcal{C} \times \mathbb{R}^n} B\left(q,\varkappa\right) B^T(q,\varkappa) \gg \lambda\_0 I > 0.$$

(viii)The inequalities

$$\int\_{\mathbb{R}^k} ||v||^2 \phi\_V(v) dv < \infty,$$

$$\sup\_{q \in \mathcal{C}} \int\_{\mathbb{R}^n} ||A(q, \mathbf{x})||^2 \Psi(d\mathbf{x}|q) \stackrel{\triangle}{=} K\_A < \infty,$$

$$\sup\_{q \in \mathcal{C}} \int\_{\mathbb{R}^n} ||h(q, \mathbf{x})||^2 \Psi(d\mathbf{x}|q) \stackrel{\triangle}{=} K\_{\text{li}} < \infty$$

are true.

(ix) The set of admissible estimators <sup>H</sup> contains only the functions *<sup>h</sup>*(·) : <sup>R</sup>*<sup>k</sup>* <sup>→</sup> <sup>R</sup>*<sup>l</sup>* , for which:

$$\sup\_{q \in \mathcal{C}} \int\_{\mathbb{R}^k} ||\overline{h}(y)||^2 \mathcal{L}(y|q) dy \prec \infty. \tag{10}$$

#### *2.3. Argumentation*

First, we discuss the sense of the assumptions in the subsection above.

Conditions (i)–(iv), describing the set F*L*, have the following interpretation.

The requirement for C to be compact (i.e., fulfillment of condition (i)) is standard for the minimax estimation problems (see, e.g., [2,3]). In the case the prior information about the vector *γ* limited by the knowledge of its domain C only, it is rather natural to treat *<sup>γ</sup>* as a random vector with an unknown distribution *<sup>F</sup>* <sup>∈</sup> <sup>F</sup>. In practice we often have some additional prior information concerning the moment characteristics of *γ*, hence the entire uncertainty set F can be significantly reduced. If, for example, *<sup>μ</sup>*(*q*) = col(*μ*1(*q*), ... , *<sup>μ</sup>N*(*q*)) : C → <sup>R</sup>*<sup>N</sup>* is a vector of convex moment functions, and we know the vector *<sup>μ</sup>* col(*μ*1, ... , *<sup>μ</sup>N*) <sup>∈</sup> <sup>R</sup>*<sup>N</sup>* of their upper bounds, then the set of admissible distributions takes the form + *<sup>F</sup>* <sup>∈</sup> <sup>F</sup> : <sup>C</sup> *<sup>μ</sup>j*(*q*)*F*(*dq*) *<sup>μ</sup><sup>j</sup>* , *j* = 1, *N* , . The ∗-weak compactness and convexity can be easily verified for this subset. Further in the presentation, we do not stress the explicit form of the "total" constraints other than (9) forming the subset F*L*: they should just guarantee the closure and convexity for F*L*. That is the sense of condition (ii).

The conditional pdf L(*y*|*q*) (3) can also be treated as the likelihood function of the parameter *γ*, calculated at the point *q* given the observed sample *Y* = *y*. This likelihood value reflects the relevance of the parameter value *q* to the realized observation *y*. By analogy, the function L(*y*, *F*) can be considered as some generalization of the likelihood function that evaluates the correspondence between the uncertain distribution *F* and the realized observation *y*. The following lower and upper bounds for this value are obvious:

$$0 < \underline{\mathcal{L}}(y) \stackrel{\triangle}{=} \min\_{q \in \mathcal{C}} \mathcal{L}(y|q) \; \leqslant \, \mathcal{L}(y, F) \; \leqslant \max\_{q \in \mathcal{C}} \mathcal{L}(y|q) \stackrel{\triangle}{=} \overline{\mathcal{L}}(y).$$

Below in the paper we suppose that the likelihood level *L* lies in [L(*y*),L(*y*)]. The subset formed by constraint {*<sup>F</sup>* <sup>∈</sup> <sup>F</sup> : <sup>C</sup> <sup>L</sup>(*y*|*q*)*F*(*dq*) - *L*} is called *a distribution subset satisfying the likelihood conformity constraint of the level L*. It is nonempty because it contains at least all distributions with the support lying within the set {*q* ∈ C : L(*y*|*q*) -*L*}.

Adjusting the level *L*, we can vary the uncertainty set F*L*, choosing the distributions *F*, which are more or less relevant to the realized observations *Y* = *y*. That is an essence of condition (iii). Condition (iv) is obvious: all the constraints, defining the set F*L*, should be feasible.

Condition (v) is technical: it provides correctness of a subsequent change of measure. The condition is non-restricting because the broad class of the functions *A*, *B* and *h* can be approximated by the continuous functions. Conditions (vi) and (vii) guarantee correct utilization of the Fubini theorem and an abstract variant of the Bayes formula [19]. In practice these conditions are usually valid. Condition (viii) guarantees finite variance for both the observations and estimated vector independently of the distribution *F*.

Condition (ix) guarantees a finite variance of the estimate *h*(*Y*) independently of *<sup>F</sup>* <sup>∈</sup> <sup>F</sup>*L*.

The solution to (8) is obvious in the case of the one-point set <sup>F</sup>*<sup>L</sup>* <sup>=</sup> {*F*}. This means the distribution *F* of the parameter *γ* is known, and the initial problem is reduced to the traditional optimal in the mean square sense (MS-optimal) estimation problem. The case of the one-point set C = {*q*} is quite similar. In both cases the optimal estimator is completely defined by the conditional expectation (CE): *h*(*y*) = <sup>E</sup>*F*{*h*(*γ*, *<sup>X</sup>*)|*<sup>Y</sup>* <sup>=</sup> *<sup>y</sup>*} in the case of a known distribution *<sup>F</sup>*, and *h*(*y*) = <sup>E</sup>{*<sup>q</sup>*}{*h*(*q*, *<sup>X</sup>*)|*<sup>Y</sup>* <sup>=</sup> *<sup>y</sup>*} in the "one point" case.

In the general case of <sup>F</sup>*<sup>L</sup>* this result is inapplicable, because the CE <sup>E</sup>*F*{*h*(*γ*, *<sup>X</sup>*)|*<sup>Y</sup>* <sup>=</sup> *<sup>y</sup>*} is a functional of the unknown distribution *F*.

The stated estimation problem has a transparent interpretation. First, under prior uncertainty of the distribution *F* the replacement of the loss function (6) by guaranteeing analog looks natural. Second, utilization of the CE in the criterion means that the desired estimate should be calculated optimally for each observed sample. The criteria in the form of the CE appear often in estimation and control problems [11,17,20–22]. Mostly, the estimation is the preliminary stage in the solution to the optimization and/or control problem under incomplete information. The random disturbances/noises in such observation systems represent:


The impact of the two latter types is not necessarily the nonrandom functions of available observations, but some "extra generated" random processes with distributions dependent on the observations. This type of control is used in the areas of telecommunications [25,26], cellular networks [27], technical systems [28], etc. The proposed minimax

criterion allows inhibiting the negative effect of the "additional randomness" in the external signals (the third type of disturbances mentioned above) to the estimation quality.

The additional comprehension of the natural gaps, which are inherent to the minimax estimation paradigm, and the ways of their partial coverage can be revealed by the following interpretation. It is well-known that in the case a minimax estimation problem can be reduced to a two-person game with a saddle point, the minimax estimator is the best one calculated for the LFD. The form of the LFD can be very strange and artificial. Moreover, the conformity degree of the LFD to the realized observations can be too low. Thus, the utilization of various sample conformity indices (particularly the ones based on the likelihood function) admits all to describe this degree, restrict it from below, implicitly reduce the distribution uncertainty set and exclude "exotic" variants of the LFDs.

Minimax estimation of the regression parameters is an investigation object in the various settings. Mostly, the observation model is a linear function of the estimated parameters corrupted by an additive Gaussian noise. The optimality criterion is a mathematical expectation of some loss function. In [29], the problem is solved by engaging the framework of the fuzzy sets. The authors of [30,31] used the criterion other than the traditional mean square one, and the estimated vector was random with the uncertain discrete distribution. In [32], the Gaussian noises have an uncertain but bounded covariance matrix. The papers [33–35] are also devoted to the minimax Bayesian estimation in the regression under various geometric and moment constraints of the estimated parameters. The criterion functions are *<sup>p</sup>* norms of the estimation errors.

The optimality criterion in the form of CE and the admissibility of nonlinear estimates distinguish the proposed estimation problem from the recently considered ones [2,3,5–7,9]. A closely related problem considered in [11] has an essential difference. The uncertain parameter in [11] was treated as unknown and nonrandom, and hence the initial minimax problem could not be solved in terms of the saddle points. Moreover, the statistic uncertainty in [11] gave no possibility to take into account any additional prior and posterior information about the moment characteristics, conformity indices, etc. The paper [14] was devoted to the particular case of the likelihood constraints only. An idea to use confidence sets, calculated by the available statistical data, as the uncertainty sets of the distribution moments was used in [36] for the conditionally-minimax prediction.
