Next Article in Journal
Transitioning from the University to the Workplace: A Duration Model with Grouped Data
Previous Article in Journal
Neurodevelopmental Impairments Prediction in Premature Infants Based on Clinical Data and Machine Learning Techniques
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Optimal Estimators of Cross-Partial Derivatives and Surrogates of Functions

by
Matieyendou Lamboni
1,2
1
Department DFR-ST, University of Guyane, 97346 Cayenne, France
2
228-UMR Espace-Dev, University of Guyane, University of Réunion, IRD, University of Montpellier, 34090 Montpellier, France
Stats 2024, 7(3), 697-718; https://doi.org/10.3390/stats7030042
Submission received: 31 May 2024 / Revised: 4 July 2024 / Accepted: 12 July 2024 / Published: 14 July 2024
(This article belongs to the Section Statistical Methods)

Abstract

:
Computing cross-partial derivatives using fewer model runs is relevant in modeling, such as stochastic approximation, derivative-based ANOVA, exploring complex models, and active subspaces. This paper introduces surrogates of all the cross-partial derivatives of functions by evaluating such functions at N randomized points and using a set of L constraints. Randomized points rely on independent, central, and symmetric variables. The associated estimators, based on N L model runs, reach the optimal rates of convergence (i.e., O ( N 1 ) ), and the biases of our approximations do not suffer from the curse of dimensionality for a wide class of functions. Such results are used for (i) computing the main and upper bounds of sensitivity indices, and (ii) deriving emulators of simulators or surrogates of functions thanks to the derivative-based ANOVA. Simulations are presented to show the accuracy of our emulators and estimators of sensitivity indices. The plug-in estimates of indices using the U-statistics of one sample are numerically much stable.

1. Introduction

Derivatives are relevant in modeling, such as inverse problems, first-order and second-order stochastic approximation methods [1,2,3,4], exploring complex mathematical models or simulators, derivative-based ANOVA (Db-ANOVA) or exact expansions of functions, and active subspaces. First-order derivatives or gradients are sometime available in modeling. Instances are (i) models defined via their rates of change with respect to their inputs; (ii) implicit functions defined via their derivatives [5,6]; (iii) cases listed in [7,8] and the references therein.
In FANOVA and sensitivity analysis user and developer communities (see, e.g., [9,10,11,12,13,14]), screening of input variables and interactions of high-dimensional simulators is often performed before building emulators of such models using Gaussian processes [15,16,17,18,19], polynomial chaos expansions and SS-ANOVA [20,21], or other machine learning approaches. Emulators are fast-evaluated models that better approximate complex and/or too-expansive simulators. Efficient variance-based screening methods rely on the upper bounds of generalized sensitivity indices, including Sobol’ indices (see [14,22,23,24,25,26] for independent inputs and [27,28,29] for non-independent variables). Such upper bounds require the computations of cross-partial derivatives, even for simulators for which these computations are time-demanding or impossible. Also, active subspaces rely on the first-order derivatives for performing dimension reduction and then for approximating complex models [30,31,32,33].
For functions with full interactions, all the cross-partial derivatives are used in the integral representations of the infinitesimal increment of functions [34], and in the unanchored decompositions of functions in the Sobolev space [35]. Recently, such derivatives have become crucial in the Db-ANOVA representation of every smooth function, such as high-dimensional PDE models. Indeed, it is known, in [14], that every smooth function f admits an exact Db-ANOVA decomposition, that is, x Ω R d ,
f ( x ) = E f ( X ) + v { 1 , , d } | v | > 0 E X D | v | f ( X ) k v G k ( X k ) 1 I [ X k x k ] g k ( X k ) ,
where D | v | f stands for the cross-partial derivative with respect to x k for any k v ; X : = X 1 , , X d is a random vector of independent variables, supported on an open Ω with margins G j s and densities g j s (i.e., X j G j , j = 1 , , d ).
Computing all the cross-partial derivatives using a few model runs or evaluations of functions is challenging. Direct computations of accurate cross-partial derivatives were considered in [36,37] using the generalization of Richardson’s extrapolation. Such approximations of all the cross-partial derivatives require a number of model runs that strongly depends on the dimensionality (i.e., d). While adjoint-based methods can provide the gradients for some POE/PDE-based models using only one simulation of the adjoint models [38,39,40,41,42,43], note that computing the Hessian matrix requires running d second-order adjoint models in general, provided that such models are available (see [38,44]).
Stochastic perturbations methods or Monte Carlo approaches have been used in stochastic approximations (see, e.g., [1,2,3,4]), including derivative-free optimization (see [2,3,4,45,46,47,48] and the references therein), for computing the gradients and Hessian of functions. Such approaches lead to the estimates of gradients using a number of model runs that can be less than the dimensionality [48,49]. While gradients computations and the convergence analysis are considered in the first-order stochastic approximations, estimators of the Hessian matrices are investigated in the second-order stochastic approximations [2,50,51,52,53]. Most of such approaches rely on the Taylor expansions of functions and randomized kernels and/or a random vector that is uniformly distributed on the unit sphere. Nevertheless, independent variables are used in [48,50,53], and the approaches considered in [53,54] rely on the Stein identity [55]. Note that the upper bounds of the biases of such approximations depend on the dimensionality, except in the work [48] for the gradients only. Moreover, the convergence analysis for more than the second-order cross-partial derivatives are not available according to our knowledge.
Given a smooth function defined on R d , the motivation of this paper consists of proposing new approaches for deriving surrogates of cross-partial derivatives and derivative-based emulators of functions that
  • Are simple to use and generic by making use of d independent variables that are symmetrically distributed about zero and a set of constraints;
  • Lead to dimension-free upper bounds of the biases related to the approximations of cross-partial derivatives for a wide class of functions;
  • Provide estimators of cross-partial derivatives that reach the optimal and parametric rates of convergence;
  • Can be used for computing all the cross-partial derivatives and emulators of functions at given points using a small number of model runs.
In this paper, new expressions of cross-partial derivatives of any order are derived in Section 3 by combining the properties of (i) the generalized Richardson extrapolation approach so as to increase the approximations accuracy, and (ii) the Monte Carlo approaches based only on independent random variables that are symmetrically distributed about zero. Such expressions are followed by their order of approximations and biases. We also derive the estimators of such new expressions and their associated mean squared errors, including the rates of convergence for some classes of functions (see Section 3.3). Section 3.4 provides the derivative-based emulators of functions, depending on the strength of the interactions or (equivalently) the cross-partial derivatives, thanks to Equation (1). The strength of the interactions can be assessed using sensitivity analysis. Thus, Section 4 deals with the derivation of new expressions of sensitivity indices and their estimators by making use of the proposed surrogates of cross-partial derivatives. Simulations based on test functions are considered in Section 5 to show the accuracy of our approach, and we conclude this work in Section 6.

2. Preliminary

For an integer d N { 0 } , let X : = ( X 1 , , X d ) be a random vector of d independent and continuous variables with marginal cumulative distribution functions (CDFs) F j , j = 1 , , d and probability density functions (PDFs) ρ j , j = 1 , , d .
For a non-empty subset u { 1 , , d } , we use | u | for its cardinality (i.e., the number of elements in u) and ( u ) : = { 1 , , d } u . Also, we use X u : = ( X j , j u ) for a subset of inputs, and we have the partition X = ( X u , X u ) . Assume that:
Assumption 1 
(A1).  X is a random vector of independent variables, supported on Ω.
Working with partial derivatives requires a specific mathematical space. Given an integer n N { 0 } and an open set Ω R d , consider a weak partial differentiable function f : Ω R n [56,57] and a subset v { 1 , , d } with | v | > 0 . Namely, we use D | v | f : = k v x k f for the | v | -th weak cross-partial derivatives of each component of f with respect to each x k with k v .
Likewise, given ı : = ( i 1 , , i d ) N d , denote D ( ı ) f : = k = 1 d i k x k f and 1 I v ( j ) = 1 if j v and zero otherwise. Thus, taking v : = 1 I v ( 1 ) , , 1 I v ( d ) yields D | v | f = D ( v ) f . Moreover, denote ( x ) ı = x ı : = k = 1 d x k i k , ı ! : = i 1 ! i d ! , and consider the Hölder space of α -smooth functions given by x , y R d
H α : = f : R d R : f ( x ) 0 i 1 + + i d α 1 D ( ı ) f ( y ) ı ! x y ı M α x y 2 α ,
with α 1 , M α > 0 , and D ( ı ) f ( y ) as weak cross-partial derivatives. We use · 2 for the Euclidean norm, | | · | | 1 for the L 1 -norm, E ( · ) for the expectation, and V ( · ) for the variance.

3. Surrogates of Cross-Partial Derivatives and New Emulators of Functions

3.1. New Expressions of Cross-Partial Derivatives

This section aims at providing expressions of cross-partial derivatives using the model of interest, and new independent random vectors. We provide approximated expressions of D | u | f ( x ) for all u { 1 , , d } and the associated orders of approximations.
Given L , q N { 0 } , consider β R with = 1 , , L , h : = ( h 1 , , h d ) R + d , and denote with V : = ( V 1 , , V d ) as d-dimensional random vectors of independent variables satisfying j { 1 , , d }
E V j = 0 ; E V j 2 = σ 2 ; E V j 2 q + 1 = 0 ; E V j 2 q < + .
Random vectors of d independent variables that are symmetrically distributed about zero are instances of V , including the standard Gaussian random vector and symmetric uniform distributions about zero.
  • Denote β h V : = ( β h 1 V 1 ; , β h d V d ) . The reals β s are used for controlling the order of derivatives (i.e., | u | ) we are interested in, while V j s help in selecting one particular derivative of order | u | . Finally, h j s aim at defining a neighborhood of a sample point x of X that will be used. Thus, using β m a x : = max | β 1 | , , | β L | and keeping in mind the variance of β h j V j , we assume that j { 1 , , d }
Assumption 2 
(A2).  β m a x h j σ 1 / 2 o r , e q u i v a l e n t l y , 0 < β m a x h j | V j | 1 f o r   b o u n d e d V j s .
Based on the above framework, Theorem 1 provides a new expression of the cross-partial derivatives of f. Recall that | u | is the cardinality of u and D | u | f = D ( u ) f .
Theorem 1. 
Consider distinct β s, and assume that f H α with α | u | + 2 L and (A2) holds. Then, for any u { 1 , , d } with | u | > 0 , there exists α | u | { 1 , , L } and coefficients C 1 ( | u | ) , , C L ( | u | ) such that
D | u | f ( x ) = = 1 L C ( | u | ) E f x + β h V k u V k ( h k σ 2 ) + O h 2 2 α | u | .
Proof. 
The detailed proof is provided in Appendix A.    □
In view of Theorem 1, we are able to compute all the cross-partial derivatives using the same evaluations of functions with the same or different order of approximations, depending on the constraints imposed to determine the coefficients C 1 ( | u | ) , , C L ( | u | ) (see Appendix A). While the setting L = 1 , β 1 = 1 , C 1 ( | u | ) = 1 or the constraints = 1 L C ( | u | ) β r = δ | u | , r ; r = 0 , , L 2 , | u | , L 1 | u | lead to the order O h 2 2 , one can increase that order up to O h 2 2 L by using either = 1 L C ( | u | ) β r + | u | = δ 0 , r ; r = 0 , 2 , , 2 ( L 1 ) or the full constraints given by = 1 L + | u | C ( | u | ) β r = δ | u | , r ; r = 0 , , 2 L + | u | 1 . The last setting is going to improve the approximations and numerical computations of derivatives. Since increasing the number of constraints requires more evaluations of simulators, and in ANOVA-like decomposition of f ( X ) , it is common to neglect the higher-order components or, equivalently, the higher-order cross-partial derivatives thanks to Equation (1), the following parsimony number of constraints may be considered. Given an integer r * > 0 , controlling the partial derivatives of order up to r * L 2 can be performed using the constraints
= 1 L C ( | u | ) β r = δ | u | , r ; r = 0 , 1 , , L 1 if | u | = 1 , , r * = 1 L C ( | u | ) β r = δ | u | , r ; r = 0 , , r * , | u | , , | u | + L r * 2 otherwise .
Equation (3) gives approximations of all the cross-partial derivatives of O h 2 α | u | where o : = L | u | if 1 | u | r * L r * 1 otherwise and α | u | = o if o is even and o + 1 otherwise. This equation relies on the Vandermonde matrices and the generalized Vandermonde matrices, which ensure the existence and uniqueness of the coefficients for distinct values of β s (i.e., β 1 β 2 ) because the determinant is of the form 1 1 < 2 L β 1 β 2 (see [58,59] for more details and the inverse of such matrices).
Remark 1. 
When L = 1 , we must have β 1 = 1 , C i ( | u | ) = 1 , u { 1 , , d } . Thus, the coefficient C i ( | u | ) does not necessarily depend on | u | . Taking L for an even integer, the following nodes may be considered: β 1 , , β L = ± 2 k , k = 0 , , L 2 2 . When L is odd, one may add 0 to the above set. Of course, other possibilities can be considered provided that = 1 L C ( | u | ) β | u | = 1 .
Remark 2. 
For a given u 0 { 1 , , d } , if we are only interested in all the | v | t h cross-partial derivatives with v u 0 , it is better to set h u = 0 in Equation (2).
Remark 3. 
Links to other works.
Consider β 1 = 1 , β 2 = 1 , β 3 = 0 , and V j N ( 0 , 1 ) with j = 1 , , d . Using L = 1 or L = 2 or L = 3 , our estimators of the first-order and the second-order cross-partial derivatives are very similar to the results obtained in [53].
Using the uniform perturbations and L = 2 and L = 3 , our estimators of the first-order and the second-order cross-partial derivatives are similar to those provided in [50]. However, we will see later that specific uniform distributions allow for obtaining dimension-free upper bounds of the biases.

3.2. Upper Bounds of Biases

To derive precise biases of our approximations provided in Theorem 1, different structural assumptions on the deterministic functions f and V are considered. Assume that f H α with α | u | is sufficient to define D | u | f ( x ) for any u { 1 , , d } . Note that such an assumption does not depend on the dimensionality d. For the sequel of generality, we provide the upper bounds of the biases for any value of L by considering two sets of constraints.
Denote with R : = ( R 1 , , R d ) a d-dimensional random vector of independent variables that are centered about zero and standardized (i.e., E [ R k 2 ] = 1 , k = 1 , , d ), and R c the set of such random vectors. For any r N , define
Γ r : = = 1 L C ( | u | ) β r ; K 1 , L : = inf R R c E R 2 2 L k u R k 2 Γ | u | + 2 L .
Corollary 1. 
Consider distinct β s and the constraints = 1 L C ( | u | ) β | u | + r = δ 0 , r with r = 0 , 2 , 4 , , 2 ( L 1 ) . If f H | u | + 2 L and (A2) hold, then there is M | u | + 2 L > 0 such that
D | u | f ( x ) = 1 L C ( | u | ) E f x + β h V k u V k h k σ 2 σ 2 L M | u | + 2 L K 1 , L h 2 2 L .
Moreover, if V k U ( ξ , ξ ) with ξ > 0 and k = 1 , , d , then
D | u | f ( x ) = 1 L C ( | u | ) E f x + β h V k u V k h k σ 2 M | u | + 2 L ξ 2 L | | h 2 | | 1 L Γ | u | + 2 L .
Proof. 
See Appendix B for the detailed proof.    □
In view of Corollary 1, one obtains the upper bounds that do not depend on the dimensionality d by choosing ξ = σ = 1 d , h k = h for instance. When Γ | u | + 2 L > 1 , the choice ξ = σ = d 1 2 Γ | u | + 2 L 1 2 L is more appropriate. Corollary 1 provides the results for highly smooth functions. To be able to derive the optimal rates of convergence for a wide class of functions (i.e., H | u | + 1 ), Corollary 2 starts providing the biases for this class of functions under a specific set of constraints. To that end, define
K 1 : = inf R R c E R 2 k u R k 2 .
Corollary 2. 
For distinct β s, consider r * { 0 , , | u | 1 } and the constraints = 1 L = r * + 2 C ( | u | ) β r = δ | u | , r with r = 0 , 1 , , r * , | u | . If f H | u | + 1 and (A2) hold, then there is M | u | + 1 > 0 such that
D | u | f ( x ) = 1 L = 2 C ( | u | ) E f x + β h V k u V k h k σ 2 σ M | u | + 1 K 1 h 2 Γ | u | + 1 .
Moreover, if V k U ( ξ , ξ ) with ξ > 0 and k = 1 , , d , then
D | u | f ( x ) = 1 L = 2 C ( | u | ) E f x + β h V k u V k h k σ 2 ξ M | u | + 1 | | h | | 1 Γ | u | + 1 .
Proof. 
See Appendix C.    □
Note that the upper bounds derived in Corollary 2 depend on r * { 0 , , | u | 1 } through L = r * + 2 and Γ | u | + 1 = = 1 L = r * + 2 C ( | u | ) β | u | + 1 . Thus, taking h k = h and ξ = 1 / ( d Γ | u | + 1 ) will give a dimension-free upper bound that does not increase with r * . The crucial role and importance of r * is highlighted in Section 3.3.
Remark 4. 
When r * = 0 , which implies that L = 2 , consider β 1 = 1 , β 2 = 1 ; C 1 ( | u | ) = 1 / 2 ; C 2 ( | u | ) = 1 / 2 when | u | is even and C 2 ( | u | ) = 1 / 2 otherwise. With the above choices, the upper bounds given by Equations (6) and (7) become, respectively,
σ M | u | + 1 K 1 h 2 , ξ M | u | + 1 | | h | | 1 .
We can check that the same results hold when using L = 1 , β 1 = 1 and C 1 ( | u | ) = 1 .
Remark 5. 
It is worth noting that we obtain exact approximations of D | u | f ( x ) in Corollary 1 for the class of functions described by
B 0 : = h H : D ( ı ) h = 0 , ı N d a n d | | ı | | 1 | u | + 2 L .
In general, exact approximations of D | u | f ( x ) are obtained when L for highly smooth functions.

3.3. Convergence Analysis

Given a sample of V , that is, V i : = V i , 1 , , V i , d i = 1 N and using Equation (2), the method of moments implies that the estimator of D | u | f ( x ) is given by
D | u | f ^ ( x ) : = 1 N i = 1 N = 1 L C ( | u | ) f x + β h V i k u V i , k ( h k σ 2 ) .
Statistically, it is common to measure the quality of an estimator using the mean squared error (MSE), including the rates of convergence. The MSEs can also help in determining the optimal value of h . Theorem 2 provides such quantities under different assumptions. To that end, define
K 2 , r : = inf R R c E R 2 2 r k u R k 2 .
Theorem 2. 
For distinct β s, consider r * N and = 1 L = r * + 2 C ( | u | ) β r = δ | u | , r with r = 0 , 1 , , r * , | u | and r * | u | 1 . If f H | u | + 1 and (A2) hold, then
E D | u | f ^ ( x ) D | u | f ( x ) 2 σ 2 M | u | + 1 2 K 1 2 Γ | u | + 1 2 h 2 2 + M r * + 1 2 Γ r * + 1 2 K 2 , r * + 1 N σ 2 ( | u | r * 1 ) k u h k 2 h 2 2 r * + 1 .
Moreover, if V k U ( ξ , ξ ) with k = 1 , , d , then
. E D | u | f ^ ( x ) D | u | f ( x ) 2 M | u | + 1 2 | | h | | 1 2 ξ 2 Γ | u | + 1 2 + 3 | u | M r * + 1 2 Γ r * + 1 2 N ξ 2 ( | u | r * 1 ) k u h k 2 h 2 2 ( r * + 1 ) .
Proof. 
See Appendix D for the detailed proof.    □
Theorem 2 provides the upper bounds of MSEs for the anisotropic case. Using a uniform bandwidth, that is, h k = h , reveals that such upper bounds clearly depend on the dimensionality of the function of interest. Indeed, we can check that the upper bounds of the MSEs provided in Equations (8) and (9) become, respectively,
σ 2 M | u | + 1 2 K 1 2 Γ | u | + 1 2 d h 2 + M r * + 1 2 Γ r * + 1 2 K 2 , r * + 1 N σ 2 ( | u | r * 1 ) h 2 ( | u | r * 1 ) d r * + 1 2 ,
ξ 2 M | u | + 1 2 Γ | u | + 1 2 d 2 h 2 + 3 | u | M r * + 1 2 Γ r * + 1 2 d r * + 1 N ξ 2 ( | u | r * 1 ) h 2 ( | u | r * 1 ) .
By minimizing such upper bounds with respect to h, the optimal rates of convergence of the proposed estimators are derived in Corollary 3.
Corollary 3. 
Under the assumptions made in Theorem 2, if r * < | u | 1 , then
E D | u | f ^ ( x ) D | u | f ( x ) 2 = O N 1 | u | r * d 1 + | u | 1 | u | r * .
Proof. 
See Appendix E for the detailed proof.    □
The optimal rates of convergence obtained in Corollary 3 are far away from the parametric ones, and such rates decrease with | u | . Nevertheless, such optimal rates are function of d 2 for any | u | 2 using r * = 1 . The maximum rate of convergence that can be reached is N 1 / 2 d | u | + 1 2 by taking r * = | u | 2 .
To derive the optimal and parametric rates of convergence, let us now choose r * = | u | 1 and h k = h with k = 1 , , d . Thus, we can see that the second terms of the upper bounds of the MSEs (provided in Theorem 2) are function of d | u | , but they are independent of h. This key observation leads to Corollary 4.
Corollary 4. 
Under the assumptions made in Theorem 2, if r * = | u | 1 ; ξ = 1 / ( d Γ | u | + 1 ) and h k = h N γ / 2 with γ [ 1 , 2 ] , then we have
E D | u | f ^ ( x ) D | u | f ( x ) 2 = O N 1 d | u | .
Proof. 
The proof is straightforward since h 2 N γ and N h if N . □
It is worth noting that the upper bound of the squared bias obtained in Corollary 4 does not depend on the dimensionality thanks to ξ . Also, the optimal and parametric rates of convergence are reached by means of L N = ( | u | + 1 ) N model evaluations, and such model runs can still be used for computing D | v | f ( x ) for every v { 1 , , d } with | v | | u | . Based on the same assumptions, it appears that the results provided in Corollary 4 are much more convenient for | u | = 1 , 2 in higher dimensions, while those obtained in Corollary 3 are well suited for higher dimensions and for higher values of | u | { 1 , , d } .
For highly smooth functions and for large values of | u | and d, we are able to derive intermediate rates of convergence of the estimator of D | u | f (see Theorem 3). To that end, consider an integer L , L = r * + L + 1 , and denote with [ b ] the largest integer that is less than b for any real b.
Theorem 3. 
For an integer r * | u | 2 , consider = 1 L C ( | u | ) β r = δ | u | , r r = 0 , 1 , , r * , | u | , | u | + 2 , | u | + 4 , , | u | + 2 ( L 1 ) . If f H | u | + 2 L and (A2) hold, then
E D | u | f ^ ( x ) D | u | f ( x ) 2 σ 4 L M | u | + 2 L 2 K 1 , L 2 d L h 4 L + M r * + 1 2 Γ r * + 1 2 K 2 , r * + 1 N σ 2 ( | u | r * 1 ) h 2 ( | u | r * 1 ) d r * + 1 2 .
Moreover, if V k U ( ξ , ξ ) with ξ > 0 and k = 1 , , d , then
E D | u | f ^ ( x ) D | u | f ( x ) 2 ξ 4 L M | u | + 2 L 2 Γ | u | + 2 L 2 d 2 L h 4 L + 3 | u | M r * + 1 2 Γ r * + 1 2 d r * + 1 N ξ 2 ( | u | r * 1 ) h 2 ( | u | r * 1 ) .
For a given 0 ϵ o p < 1 , taking L = ( | u | r * 1 ) ( 1 ϵ o p ) 2 ϵ o p leads to
E D | u | f ^ ( x ) D | u | f ( x ) 2 = O N 1 + ϵ o p d | u | ( 1 ϵ o p ) .
Proof. 
Detailed proofs are provided in Appendix F.    □
It turns out that the optimal rate of convergence derived in Theorem 3 is a trade-off between the sample size N and the dimensionality d. For instance, when ϵ o p = 1 / 2 , the optimal rate becomes O N 1 / 2 d | u | / 2 ) , which improves the rate obtained in Corollary 3, but under different assumptions.
Remark 6. 
Since the bias vanishes for the class of functions B 0 , taking σ = 1 / h yields E D | u | f ^ ( x ) D | u | f ( x ) 2 = O N 1 (see Appendix G). Note that such an optimal rate of convergence is dimension-free.

3.4. Derivative-Based Emulators of Smooth Functions

Using Equation (1) and bearing in mind the estimators of the cross-derivatives provided in Section 3.3, this section aims at providing surrogates of smooth functions, also known as emulators. The general expression of the surrogate of f is given below.
Corollary 5. 
For any | u | { 1 , , d } , consider = 1 L = 5 C ( | u | ) β r = δ | u | , r , r = 0 , 1 , , r * = max ( | u | 1 , 3 ) , | u | . Assume that f H d + 1 and (A1) and (A2) hold. Then, an approximation of f at x Ω is given by
f ^ ( x ) : = E f ( X ) + v { 1 , , d } | v | > 0 E X D | v | f ^ ( X ) k v G k ( X k ) 1 I [ X k x k ] g k ( X k ) P f ( x ) .
The above plug-in estimator is consistent using the law of large numbers. For a given x , it is worth noting that the choice of G j s is arbitrary, provided that such distributions are supported on an open neighborhood of x .
Often, the higher-order cross-partial derivatives or, equivalently, the higher-order interactions among the model inputs almost vanish, leading us to consider the truncated expressions. Given an integer s with 0 < s d and keeping in mind the ANOVA decomposition, consider the class of functions that admit at most the s-th-order interactions, that is, A s : = h : R d R : h ( x ) = v { 1 , , d } | v | s h v ( x v ) . Truncating the functional expansion is a standard practice within the ANOVA-community, that is, s d is assumed in higher dimensions [10,12]. For such a class of functions, requiring f H α s with α s s is sufficient to derive our results. Thus, the truncated surrogate of f is given by
f s ^ ( x ) : = E f ( X ) + v { 1 , , d } 0 < | v | s E X D | v | f ^ ( X ) k v G k ( X k ) 1 I [ X k x k ] g k ( X k ) .
Under the assumptions made in Corollary 5, f s ^ ( x ) reaches the optimal and parametric rate of convergence for the class of functions A 3 . For instance, taking s = 1 leads to the first-order emulator of f, which relies only on the gradient information. Thus, f s = 1 ^ provides accurate estimates of additive models of the form j = 1 d h j ( x j ) , where h j s are given functions. Likewise, f 2 ^ allows for incorporating the second-order terms, but it requires the second-order cross-partial derivatives. Thus, it is relevant to find the class of functions A s which contains the model of interest before building emulators. The following section deals with such issues.

4. Applications: Computing Sensitivity Indices

In high-dimensional settings, reducing the dimension of functions is often achieved by using screening measures, that is, measures that can be used for quickly identifying non-relevant input variables. Screening measures based on the upper bounds of the total sensitivity indices rely on derivatives [14,23,24,25,60,61]. This section aims at providing optimal computations of upper bounds of the total indices, followed by the computations of the main indices using derivatives.
By evaluating the function f given by Equation (1) at a random vector X using G j = F j , one obtains a random vector of the model outputs. Generalized sensitivity indices, including Sobol’s indices, rely on the variance–covariance of sensitivity functionals (SFs), which are also random vectors containing the information about the overall contributions of inputs [14,26,62]. The derivative-based expressions of SFs are given below (see [14] for more details). Given u { 1 , , d } , the interaction SF of the inputs X u is given by
f u ( X u ) : = E X D | u | f ( X ) k u F k ( X k ) 1 I [ X k X k ] ρ k ( X k ) ,
and the first-order SF of X u is given by
f u f o ( X u ) : = v , v u | v | > 0 f v ( X v ) .
Likewise, the total-interaction SF of X u is given by [14]
f u s u p ( X ) : = v { 1 , , d } u v f v ( X v ) = E X D | u | f X u , X u k u F k ( X k ) 1 I [ X k X k ] ρ k ( X k ) ,
and the total SF of X u is given as [14]
f u t o t ( X ) : = v { 1 , , d } u v Ø f v ( X v ) = v , v u | v | > 0 E X D | v | f ( X u , X u ) k v F k ( X k ) 1 I [ X k X k ] ρ k ( X k ) .
For a single input X j , we have f j ( X j ) = f j f o ( X j ) and f j t o t ( X ) = f j s u p ( X ) . Among similarity measures [28,29], taking the variance–covariances of SFs, that is, Σ u : = V f u ( X u ) , Σ u s u p : = V f u s u p ( X ) and Σ u t o t : = V f u t o t ( X ) , leads to [14]
Σ u = E D | u | f ( X ) D | u | f ( X ) k u F k min X k , X k F k ( X k ) F k ( X k ) ρ k ( X k ) ρ k ( X k ) ;
Σ u Σ u s u p Σ u u b : = 1 2 | u | E D | u | f ( X ) 2 k u F k X k 1 F k ( X k ) ρ k ( X k ) 2 .
Thus, Σ u u b is the upper bound of Σ u s u p . Likewise, Σ j u b is the upper bound of Σ j t o t (i.e., Σ j t o t Σ j u b ), and it can be used for screening the input variables.
To provide new expressions of the screening measures and the main induces in the following proposition, denote with V an i.i.d. copy of V , and assume that
Assumption 3 
(A3).  f ( X ) has finite second-order moments.
Proposition 1. 
Under the assumptions made in Corollary 4, assume that (A1)–(A3) hold. Then,
Σ u = 1 = 1 2 = 1 L C 1 ( | u | ) C 2 ( | u | ) E f X + β 1 h V f X + β 2 h V × k u V k V k ( h k 2 σ 4 ) F k min X k , X k F k ( X k ) F k ( X k ) ρ k ( X k ) ρ k ( X k ) + O h 2 2 ;
Σ u u b = 1 2 | u | 1 = 1 2 = 1 L C 1 ( | u | ) C 2 ( | u | ) E f X + β 1 h V f X + β 2 h V × k u V k V k h k 2 σ 4 F k X k 1 F k ( X k ) ρ k ( X k ) 2 + O h 2 2 .
Proof. 
Bearing in mind Equations (2), (12) and (13), the proof of Proposition 1 is straightforward.    □
The method of moments allows for deriving the estimators of Σ u and Σ u u b for all u { 1 , , d } . For screening inputs of models, we provide the estimators of Σ j and Σ j u b for any j { 1 , , d } . To that end, we are given four independent samples, that is, X i i = 1 N from X , X i i = 1 N from X , V i i = 1 N from V , and V i i = 1 N from V . Consistent estimators of Σ j and Σ j u b are, respectively, given by
Σ j ^ : = i = 1 N 1 = 1 2 = 1 L C 1 ( 1 ) C 2 ( 1 ) N f X i + β 1 h V i f X i + β 2 h V i V i , j V i , j h j 2 σ 4 F j min X i , j , X i , j F j ( X i , j ) F j ( X i , j ) ρ j ( X i , j ) ρ j ( X i , j ) ;
Σ j u b ^ : = i = 1 N 1 = 1 2 = 1 L C 1 ( 1 ) C 2 ( 1 ) 2 N f X i + β 1 h V i f X i + β 2 h V i V i , j V i , j h j 2 σ 4 F j X i , j 1 F j ( X i , j ) ρ j ( X i , j ) 2 .
The above (direct) estimators require 3 L N model runs for obtaining the estimates for any j { 1 , , d } . Additionally to such estimators, we derive the plug-in estimators, which are relevant in the presence of given data about the estimates of the first-order derivatives. To provide such estimators, we denote with D | { j } | f ^ ( X i ) i = 1 N 1 a sample of N 1 > 1 known or estimates of the first-order derivatives (i.e., | u | = 1 ). Using Equation (2), such estimates are obtained by considering L = 1 or L = 2 or L = 3 and N. Keeping in mind Equations (12) and (13) and the U-statistic theory for one sample, the plug-in estimator of the main index of X j is given by
Σ j ^ : = 2 N 1 ( N 1 1 ) 1 i 1 < i 2 N 1 D | { j } | f ^ ( X i 1 ) D | { j } | f ^ ( X i 2 ) F j min X i 1 , j , X i 2 , j F j ( X i 1 , j ) F j ( X i 2 , j ) ρ j ( X i 1 , j ) ρ j ( X i 2 , j ) .
Likewise, the plug-in estimator of the upper bound of the total index of X j is given by
Σ j u b ^ : = 1 2 N 1 i = 1 N 1 D | { j } | f ^ ( X i ) 2 F j X i , j 1 F j ( X i , j ) ρ j ( X i , j ) 2 .
Note that the plug-in estimators are consistent and require a total of L N 1 N 0 model runs for computing such indices, where N 0 is the number of model runs used for computing the gradient of f at X i .

5. Illustrations: Screening and Emulators of Models

5.1. Test Functions

5.1.1. Ishigami’s Function ( d = 3 )

The Ishigami function includes three independent inputs following a uniform distribution on [ π , π ] , and it is given by
f ( x ) = sin ( x 1 ) + 7 sin 2 ( x 2 ) + 0.1 x 3 4 sin ( x 1 ) .
The sensitivity indices are S 1 = 0.3139 , S 2 = 0.4424 , S 3 = 0.0 , S T 1 = 0.567 , S T 2 = 0.442 , and S T 3 = 0.243 .

5.1.2. Sobol’s g-Function ( d = 10 )

The g-function [63] includes ten independent inputs following a uniform distribution on [ 0 , 1 ] , and it is defined as follows:
f ( x ) = j = 1 d = 10 | 4 x j 2 | + a j 1 + a j .
Note that such a function is differentiable almost everywhere. According to the values of a = ( a j , j = 1 , 2 , , d ) , this function has different properties [23]:
  • If a = [ 0 , 0 , 6.52 , 6.52 , 6.52 , 6.52 , 6.52 , 6.52 , 6.52 , 6.52 ] T , the values of sensitivity indices are S 1 = S 2 = 0.39 , S j = 0.0069 , j > 2 , S T 1 = S T 2 = 0.54 , and S T j = 0.013 , j > 2 . Thus, this function has a low effective dimension (function of type A), and it belongs to A s with s > 1 (see Section 3.4).
  • If a = [ 50 , 50 , 50 , 50 , 50 , 50 , 50 , 50 , 50 , 50 ] T , the first and total indices are given as follows: S j = S T j = 0.1 , j { 1 , 2 , , d } . Thus, all inputs are important, but there is no interaction among these inputs. This function has a high effective dimension (function of type B). Note that it belongs to A 1 .
  • If a = [ 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 ] T , the function belongs to the class of functions with important interactions among inputs. Indeed, we have S j = 0.02 and S T j = 0.27 , j { 1 , 2 , , d } . All the inputs are relevant due to important interactions (function of type C). Then, this function belongs to A s with s 2 .

5.2. Numerical Comparisons of Estimators

This section provides a comparison of the direct and plug-in estimators of the main indices and the upper bounds of the total indices using the test functions of Section 5.1. Different total budgets for the model evaluations are considered in this paper, that is, N = 500 , 1000, 1500, 2000, 3000, 5000, 10,000, 15,000, 20,000. In the case of the plug-in estimators, we used N 0 = 2 d . For generating different random values, Sobol’s sequence (scrambled = 3) from the R-package randtoolbox [64] is used. We replicated each estimation 30 times by randomly choosing the seed, and the reported results are the average of the 30 estimates.
Figure 1, Figure 2, Figure 3 and Figure 4 show the mean squared errors related to the estimates of the main indices for the Ishigami function, the g-functions of type A, type B, and type C, respectively. Each figure depicts the results for L = 2 and L = 3 . All the figures show the convergence and accuracy of our estimates using either L = 2 or L = 3 .
Likewise, Figure 5, Figure 6, Figure 7 and Figure 8 show the mean squared gaps (differences) between the true total index and its estimated upper bound for the Ishigami function, the g-functions of type A, type B, and type C, respectively.
It turns out that the plug-in estimators outperform the direct ones. Also, increasing the values of L gives the same results. Moreover, the direct estimators associated with L = 1 fail to provide accuracy estimates (we do not report such results here). On the contrary, the plug-in estimates using L = 1 are reported in Table 1 for the Ishigami function and in Table 2 for the three types of the g-function. Such results suggest considering L = 1 or L = 2 with β 1 = 1 , β 2 = 1 for plug-in estimators when the total budget of model runs is small. For a larger budget of model runs, the direct estimators associated with L = 2 and β 1 = 1 , β 2 = 1 can be considered as well in practice.

5.3. Emulations of the g-Function of Type B

Based on the results obtained in Section 5.2 (see Table 2), all the inputs are important in the case of the g-function of type B, meaning that the dimension reduction is not possible. Also, the estimated upper bounds suggest weak interactions among inputs. As expected, our estimated results confirm that the g-function of type B belongs to A 1 . Thus, an emulator of f based only on the first-order derivatives is sufficient. Using this information, we have derived the emulator of that function (i.e., f s = 1 ^ ( x ) ) under the assumptions made in Corollary 5 ( r * = 0 , L 2 ) and using G j = F j with j = 1 , , d . For a given L, we used 300 model runs to build the emulator, and Figure 9 depicts the approximations of that function (called predictions) at the sample points involved in the construction of the emulator. Note that the evaluations of f at such sample points (called observations) are not directly used in the construction of such an emulator. It turns out from Figure 9 that  f s = 1 ^ provides predictions that are in line with the observations, showing the accuracy of our emulator.

6. Conclusions

In this paper, we firstly provided i) stochastic expressions of cross-partial derivatives of any order, followed by their biases, and ii) estimators of such expressions. Our estimators of the | u | -th cross-partial derivatives ( u { 1 , , d } ) reach the parametric rates of convergence (i.e., O ( N 1 d | u | ) ) by means of a set of L | u | + 1 constraints for the Hölder space of α -smooth functions H α with α > | u | . Moreover, we showed that the upper bounds of the biases of such estimators do not suffer from the curse of dimensionality. Secondly, the proposed surrogates of cross-partial derivatives are used for deriving (i) new derivative-based emulators of simulators or surrogates of models, even when a large number of model inputs contribute to the model outputs, and (ii) new repressions of the main sensitivity indices and the upper bounds of the total sensitivity indices.
Numerical simulations confirmed the accuracy of our approaches for not only screening the input variables, but also for identifying the class of functions that contains our simulator of interest, such as the class of functions with important or no interaction among inputs. This relevant information allows for designing and building the appropriate emulators of functions. In the case of the g-function of type B, our emulator of this function (based only on the first-order derivatives) provided approximations or predictions that are in line with the observations.
For functions with important interactions or, equivalently, for higher-order cross-partial derivatives, further numerical schemes are necessary to increase the numerical accuracy of the computations of such derivatives and predictions. Such perspectives will be investigated in the near future as well as the computations of the total sensitivity indices using the proposed surrogates of derivatives. Moreover, there is a need for a theoretical investigation to expect derivation of the parametric rates of convergence of the above estimators that do not suffer from the course of dimensionality. Working in C rather than in R may be helpful.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data are contained within the article.

Acknowledgments

I would like to thank the three reviewers for their comments that helped improve my manuscript.

Conflicts of Interest

The author declares no conflict of interest.

Appendix A. Proof of Theorem 1

Firstly, as ı : = ( i 1 , , i d ) N d , denote | | ı | | 1 = i 1 + + i d and u : = 1 I u ( 1 ) , , 1 I u ( d ) . The Taylor expansion of f x + β h V about x of order α is given by
f x + β h V = p = 0 α | | ı | | 1 = p D ( ı ) f ( x ) ı ! β p h V ı + O | | β h V | | 1 α + 1 .
Multiplying such an expansion by the constant C ( | u | ) , and taking the sum over = 1 , , L , the expectation E : = = 1 L C ( | u | ) E f x + β h V k u V k ( h k σ 2 ) becomes
E = p 0 | | ı | | 1 = p D ( ı ) f ( x ) ı ! C ( | u | ) β p E V ı + u h ı u σ 2 | u | .
We can see that E V ı + u h ı u 0 iff ı + u = 2 q , q N d . Equation ı + u = 2 q implies i k = 2 q k 0 if k u and i k = 2 q k 1 0 otherwise. Thus, using i k = 2 q k + 1 when k u is much more convenient, and it leads to ı = 2 q + u , q N d , which also implies that | | ı | | 1 | | u | | 1 . We then obtain D | u | f when | | q | | 1 = 0 or ı = u , and the fact that E V 2 u = E k u V k 2 = σ 2 | u | by independence. We can then write
E = D | u | f ( x ) C ( | u | ) β | u | + r 1 | | q | | 1 = r D ( 2 q + u ) f ( x ) ( 2 q + u ) ! C ( | u | ) β r + | u | E V 2 ( q + u ) h 2 q σ 2 | u | ,
using the change of variable r = p | | u | | 1 . At this point, setting L = 1 , β = 1 , and C ( | u | ) = 1 results in the approximation of D | u | f ( x ) of order O ( h 2 2 ) .
Secondly, for L > 1 , the constraints = 1 L C ( | u | ) β r + | u | = δ 0 , r r = 0 , 2 , , 2 ( L 1 ) allow to eliminate some higher-order terms so as to reach the order O h 2 2 L . One can also use = 1 L C ( | u | ) β r + | u | = δ 0 , r r = | u | , , | u | + L 2 , 0 to increase the accuracy of approximations, but keeping the order O h 2 2 when | u | + L 2 < 0 .

Appendix B. Proof of Corollary 1

  • Let q = ( q 1 , , q d ) N d , u : = 1 I u ( 1 ) , , 1 I u ( d ) N d , and consider the set α : = 2 q + u : | | q | | 1 = L . As f H | u | + 2 L , the expansion of f x + h V gives
    f ( x + β h V ) = | | ı | | 1 = 0 | u | + 2 L 1 D ( ı ) f ( x ) β | | ı | | 1 ( h V ) ı ı ! + | | ı | | 1 = | u | + 2 L ı α D ( ı ) f ( x ) β | u | + 2 L ( h V ) ı ı ! + R | u | h , β , V ,
    with the remainder term R | u | h , β , V : = β | u | + 2 L | | ı | | 1 = | u | + 2 L ı α D ( ı ) f ( x + β h V ) ( h V ) ı ı ! . Thus, R | u | h , β , V : = β | u | + 2 L | | q | | 1 = L D ( 2 q + u ) f ( x + β h V ) ( 2 q + u ) ! ( h V ) 2 q + u and
    R | u | h , β , V = β | u | + 2 L | | q | | 1 = L D ( 2 q + u ) f ( x + β h V ) ( 2 q + u ) ! ( h V ) 2 q + u = β | u | + 2 L ( h V ) u | | q | | 1 = L D ( 2 q + u ) f ( x + β h V ) ( 2 q + u ) ! ( h 2 V 2 ) q .
Using E : = = 1 L = 2 C ( | u | ) E f x + β h V k u V k h k σ 2 , Theorem 1 implies that the absolute value of the bias B : = E D | u | f ( x ) = = 1 L C ( | u | ) E R | u | h , β , V k u V k h k σ 2 is given by
B = 1 L C ( | u | ) β | u | + 2 L M | u | + 2 L E | | h 2 V 2 | | 1 L k u V k 2 σ 2 ,
as | | q | | 1 = L D ( 2 q + u ) f ( x + β h V ) ( 2 q + u ) ! ( h 2 V 2 ) q M | u | + 2 L | | h 2 V 2 | | 1 L .
Using R k = V k / σ , the results hold because E | | h 2 V 2 | | 1 L k u V k 2 σ 2 = σ 2 L E | | h 2 R 2 | | 1 L k u R k 2 .
For V k U ( ξ , ξ ) with ξ > 0 and k = 1 , , d , we have E | | h 2 V 2 | | 1 L k u V k 2 σ 2 ξ 2 L | | h 2 | | 1 L .

Appendix C. Proof of Corollary 2

  • Let α : = q + u : | | q | | 1 = 1 . As f H | u | + 1 M , we can write
    f ( x + β h V ) = | | ı | | 1 = 0 | u | D ( ı ) f ( x ) β | | ı | | 1 ( h V ) ı ı ! + | | ı | | 1 = | u | + 1 ı α D ( ı ) f ( x ) β | u | + 1 ( h V ) ı ı ! + R | u | h , β , V ,
    with the remainder term R | u | h , β , V : = β | u | + 1 | | ı | | 1 = | u | + 1 ı α D ( ı ) f ( x + β h V ) ( h V ) ı ı ! . Using E : = = 1 L = 2 C ( | u | ) E f x + β h V k u V k ( h k σ 2 ) and Theorem 1, the results hold by analogy to the proof of Corollary 1. Indeed, if R k : = V k / σ , then
    B M | u | + 1 E | | h V | | 1 k u V k 2 σ 2 Γ | u | + 1 = σ M | u | + 1 E | | h R | | 1 k u R k 2 Γ | u | + 1 σ M | u | + 1 h 2 E R 2 k u R k 2 Γ | u | + 1 .
For V k U ( ξ , ξ ) with ξ > 0 and k = 1 , , d , we have
B M | u | + 1 E | | h V | | 1 k u V k 2 σ 2 Γ | u | + 1 M | u | + 1 ξ | | h | | 1 E k u V k 2 σ 2 Γ | u | + 1 = M | u | + 1 | | h | | 1 ξ Γ | u | + 1 .

Appendix D. Proof of Theorem 2

As f H | u | + 1 implies that f H r * + 1 with r * | u | 1 , we have
f ( x + β h V ) | | ı | | 1 = 0 r * D ( ı ) f ( x ) β | | ı | | 1 ( h V ) ı ı ! M r * + 1 β h V 2 r * + 1 ,
Using the fact that = 1 L C ( | u | ) β r = 0 for r = 0 , 1 , , r * , we can write
= 1 L C ( | u | ) f ( x + β h V ) = = 1 L C ( | u | ) f ( x + β h V ) | | ı | | 1 = 0 r * D ( ı ) f ( x ) β | | ı | | 1 ( h V ) ı ı ! ,
which leads to = 1 L C ( | u | ) f ( x + β h V ) = 1 L C ( | u | ) β r * + 1 M r * + 1 h V 2 r * + 1 .
By taking the variance of the proposed estimator, we have
V D | u | f ^ ( x ) : = 1 N V = 1 L C ( | u | ) f ( x + β h V ) k u V k ( h k σ 2 ) 1 N E = 1 L C ( | u | ) f ( x + β h V ) 2 k u V k 2 ( h k 2 σ 4 ) M r * + 1 2 = 1 L C ( | u | ) β r * + 1 2 N k u h k 2 E h V 2 2 ( r * + 1 ) σ 2 | u | k u V k 2 σ 2 M r * + 1 2 = 1 L C ( | u | ) β r * + 1 2 N σ 2 ( | u | r * 1 ) k u h k 2 h 2 2 r * + 1 E R 2 2 r * + 1 k u R k 2 ,
where R k = V k / σ .
If V k U ( ξ , ξ ) , V D | u | f ^ ( x ) 3 | u | M r * + 1 2 = 1 L C ( | u | ) β r * + 1 2 N ξ 2 ( | u | r * 1 ) k u h k 2 h 2 2 ( r * + 1 ) .
The results hold using Corollary 2 and the fact that M S E = B 2 + V D | u | f ^ ( x ) .

Appendix E. Proof of Corollary 3

Let η : = | u | r * 1 > 0 , F 0 : = 3 | u | M r * + 1 2 Γ r * + 1 2 d r * + 1 and F 1 : = M | u | + 1 2 Γ | u | + 1 2 d 2 . By minimizing ξ 2 M | u | + 1 2 Γ | u | + 1 2 d 2 h 2 + 3 | u | M r * + 1 2 Γ r * + 1 2 d r * + 1 N ξ 2 ( | u | r * 1 ) h 2 ( | u | r * 1 ) , we obtain
h o p = 1 ξ η F 0 F 1 1 2 η + 2 N 1 2 η + 2 ; F 0 N ξ 2 η h 2 η = F 0 N N 2 η 2 η + 2 F 1 η F 0 2 η 2 η + 2 = F 0 N 1 η + 1 F 1 η η η + 1 ;
and
d r * + 1 | u | r * d 2 ( | u | r * 1 ) | u | r * = d 2 | u | r * 1 | u | r * = d 1 + | u | 1 | u | r * .

Appendix F. Proof of Theorem 3

The first two results hold by combining the biases obtained in Corollary 1 and the upper bounds of the variance provided in Theorem 2.
For the last result, let η : = | u | r * 1 > 0 , F 0 : = 3 | u | M r * + 1 2 Γ r * + 1 2 d r * + 1 and F 1 : = M | u | + 2 L 2 Γ | u | + 2 L 2 d 2 L . By minimizing the last upper bound, we obtain h o p : = 1 ξ η F 0 2 L F 1 1 2 η + 4 L N 1 2 η + 4 l and
F 0 N ξ 2 η h 2 η = F 0 N N 2 η 2 η + 4 L 2 L F 1 η F 0 2 η 2 η + 4 L = F 0 N 2 L η + 2 L 2 L F 1 η η η + 2 L N 2 L | u | r * 1 + 2 L d 2 L | u | | u | r * 1 + 2 L .

Appendix G. On Remark 6

The variance is V D | u | f ( x ) ^ = 1 N V = 1 L C ( | u | ) f x + β h V k u V k h k σ 2 and
V D | u | f ( x ) ^ 1 N E = 1 L C ( | u | ) f x + β h V 2 k u V k 2 h k 2 σ 4 f 2 1 = 1 , 2 = 1 L C 1 ( | u | ) C 2 ( | u | ) N h 2 | u | σ 2 | u | .

References

  1. Robbins, H.; Monro, S. A Stochastic Approximation Method. Ann. Math. Stat. 1951, 22, 400–407. [Google Scholar] [CrossRef]
  2. Fabian, V. Stochastic approximation. In Optimizing Methods in Statistics; Elsevier: Amsterdam, The Netherlands, 1971; pp. 439–470. [Google Scholar]
  3. Nemirovsky, A.; Yudin, D. Problem Complexity and Method Efficiency in Optimization; Wiley & Sons: New York, NY, USA, 1983. [Google Scholar]
  4. Polyak, B.; Tsybakov, A. Optimal accuracy orders of stochastic approximation algorithms. Probl. Peredachi Inf. 1990, 2, 45–53. [Google Scholar]
  5. Cristea, M. On global implicit function theorem. J. Math. Anal. Appl. 2017, 456, 1290–1302. [Google Scholar] [CrossRef]
  6. Lamboni, M. Derivative formulas and gradient of functions with non-independent variables. Axioms 2023, 12, 845. [Google Scholar] [CrossRef]
  7. Morris, M.D.; Mitchell, T.J.; Ylvisaker, D. Bayesian design and analysis of computer experiments: Use of derivatives in surface prediction. Technometrics 1993, 35, 243–255. [Google Scholar] [CrossRef]
  8. Solak, E.; Murray-Smith, R.; Leithead, W.; Leith, D.; Rasmussen, C. Derivative observations in Gaussian process models of dynamic systems. In Advances in Neural Information Processing Systems 15; MIT Press: Cambridge, MA, USA, 2002. [Google Scholar]
  9. Hoeffding, W. A class of statistics with asymptotically normal distribution. Ann. Math. Stat. 1948, 19, 293–325. [Google Scholar] [CrossRef]
  10. Efron, B.; Stein, C. The jacknife estimate of variance. Ann. Stat. 1981, 9, 586–596. [Google Scholar] [CrossRef]
  11. Sobol, I.M. Sensitivity analysis for non-linear mathematical models. Math. Model. Comput. Exp. 1993, 1, 407–414. [Google Scholar]
  12. Rabitz, H. General foundations of high dimensional model representations. J. Math. Chem. 1999, 25, 197–233. [Google Scholar] [CrossRef]
  13. Saltelli, A.; Chan, K.; Scott, E. Variance-Based Methods, Probability and Statistics; John Wiley and Sons: Hoboken, NJ, USA, 2000. [Google Scholar]
  14. Lamboni, M. Weak derivative-based expansion of functions: ANOVA and some inequalities. Math. Comput. Simul. 2022, 194, 691–718. [Google Scholar] [CrossRef]
  15. Currin, C.; Mitchell, T.; Morris, M.; Ylvisaker, D. Bayesian prediction of deterministic functions, with applications to the design and analysis of computer experiments. J. Am. Stat. Assoc. 1991, 86, 953–963. [Google Scholar] [CrossRef]
  16. Oakley, J.E.; O’Hagan, A. Probabilistic sensitivity analysis of complex models: A bayesian approach. J. R. Stat. Soc. Ser. B Stat. Methodol. 2004, 66, 751–769. [Google Scholar] [CrossRef]
  17. Conti, S.; O’Hagan, A. Bayesian emulation of complex multi-output and dynamic computer models. J. Stat. Plan. Inference 2010, 140, 640–651. [Google Scholar] [CrossRef]
  18. Haylock, R.G.; O’Hagan, A.; Bernardo, J.M. On inference for outputs of computationally expensive algorithms with uncertainty on the inputs. In Bayesian Statistics 5: Proceedings of the Fifth Valencia International Meeting; Oxford Academic: Oxford, UK, 1996; Volume 5, pp. 629–638. [Google Scholar]
  19. Kennedy, M.C.; O’Hagan, A. Bayesian calibration of computer models. J. R. Stat. Soc. Ser. B Stat. Methodol. 2001, 63, 425–464. [Google Scholar] [CrossRef]
  20. Sudret, B. Global sensitivity analysis using polynomial chaos expansions. Reliab. Eng. Syst. Saf. 2008, 93, 964–979. [Google Scholar] [CrossRef]
  21. Wahba, G. An introduction to (smoothing spline) anova models in rkhs with examples in geographical data, medicine, atmospheric science and machine learning. arXiv 2004, arXiv:math/0410419. [Google Scholar] [CrossRef]
  22. Sobol, I.M.; Kucherenko, S. Derivative based global sensitivity measures and the link with global sensitivity indices. Math. Comput. Simul. 2009, 79, 3009–3017. [Google Scholar] [CrossRef]
  23. Kucherenko, S.; Rodriguez-Fernandez, M.; Pantelides, C.; Shah, N. Monte Carlo evaluation of derivative-based global sensitivity measures. Reliab. Eng. Syst. Saf. 2009, 94, 1135–1148. [Google Scholar] [CrossRef]
  24. Lamboni, M.; Iooss, B.; Popelin, A.-L.; Gamboa, F. Derivative-based global sensitivity measures: General links with Sobol’ indices and numerical tests. Math. Comput. Simul. 2013, 87, 45–54. [Google Scholar] [CrossRef]
  25. Roustant, O.; Fruth, J.; Iooss, B.; Kuhnt, S. Crossed-derivative based sensitivity measures for interaction screening. Math. Comput. Simul. 2014, 105, 105–118. [Google Scholar] [CrossRef]
  26. Lamboni, M. Derivative-based generalized sensitivity indices and Sobol’ indices. Math. Comput. Simul. 2020, 170, 236–256. [Google Scholar] [CrossRef]
  27. Lamboni, M.; Kucherenko, S. Multivariate sensitivity analysis and derivative-based global sensitivity measures with dependent variables. Reliab. Eng. Syst. Saf. 2021, 212, 107519. [Google Scholar] [CrossRef]
  28. Lamboni, M. Measuring inputs-outputs association for time-dependent hazard models under safety objectives using kernels. Int. J. Uncertain. Quantif. 2024, 1–17. [Google Scholar] [CrossRef]
  29. Lamboni, M. Kernel-based measures of association between inputs and outputs using ANOVA. Sankhya A 2024. [CrossRef]
  30. Russi, T.M. Uncertainty Quantification with Experimental Data and Complex System Models; Spring: Berlin/Heidelberg, Germany, 2010. [Google Scholar]
  31. Constantine, P.; Dow, E.; Wang, S. Active subspace methods in theory and practice: Applications to kriging surfaces. SIAM J. Sci. Comput. 2014, 36, 1500–1524. [Google Scholar] [CrossRef]
  32. Zahm, O.; Constantine, P.G.; Prieur, C.; Marzouk, Y.M. Gradient-based dimension reduction of multivariate vector-valued functions. SIAM J. Sci. Comput. 2020, 42, A534–A558. [Google Scholar] [CrossRef]
  33. Kucherenko, S.; Shah, N.; Zaccheus, O. Application of Active Subspaces for Model Reduction and Identification of Design Space; Springer: Berlin/Heidelberg, Germany, 2024; pp. 412–418. [Google Scholar]
  34. Kubicek, M.; Minisci, E.; Cisternino, M. High dimensional sensitivity analysis using surrogate modeling and high dimensional model representation. Int. J. Uncertain. Quantif. 2015, 5, 393–414. [Google Scholar] [CrossRef]
  35. Kuo, F.; Sloan, I.; Wasilkowski, G.; Woźniakowski, H. On decompositions of multivariate functions. Math. Comput. 2010, 79, 953–966. [Google Scholar] [CrossRef]
  36. Bates, D.; Watts, D. Relative curvature measures of nonlinearity. J. Royal Stat. Soc. Ser. B 1980, 42, 1–25. [Google Scholar] [CrossRef]
  37. Guidotti, E. calculus: High-dimensional numerical and symbolic calculus in R. J. Stat. Softw. 2022, 104, 1–37. [Google Scholar] [CrossRef]
  38. Le Dimet, F.-X.; Talagrand, O. Variational algorithms for analysis and assimilation of meteorological observations: Theoretical aspects. Tellus A Dyn. Meteorol. Oceanogr. 1986, 38, 97–110. [Google Scholar] [CrossRef]
  39. Le Dimet, F.X.; Ngodock, H.E.; Luong, B.; Verron, J. Sensitivity analysis in variational data assimilation. J. Meteorol. Soc. Jpn. 1997, 75, 245–255. [Google Scholar] [CrossRef]
  40. Cacuci, D.G. Sensitivity and Uncertainty Analysis—Theory, Chapman & Hall; CRC: Boca Raton, FL, USA, 2005. [Google Scholar]
  41. Gunzburger, M.D. Perspectives in Flow Control and Optimization; SIAM: Philadelphia, PA, USA, 2003. [Google Scholar]
  42. Borzi, A.; Schulz, V. Computational Optimization of Systems Governed by Partial Differential Equations; SIAM: Philadelphia, PA, USA, 2012. [Google Scholar]
  43. Ghanem, R.; Higdon, D.; Owhadi, H. Handbook of Uncertainty Quantification; Springer International Publishing: Berlin/Heidelberg, Germany, 2017. [Google Scholar]
  44. Wang, Z.; Navon, I.M.; Le Dimet, F.-X.; Zou, X. The second order adjoint analysis: Theory and applications. Meteorol. Atmos. Phys. 1992, 50, 3–20. [Google Scholar] [CrossRef]
  45. Agarwal, A.; Dekel, O.; Xiao, L. Optimal algorithms for online convex optimization with multi-point bandit feedback. In Proceedings of the 23rd Conference on Learning Theory, Haifa, Israel, 27–29 June 2010; pp. 28–40. [Google Scholar]
  46. Bach, F.; Perchet, V. Highly-smooth zero-th order online optimization. In Proceedings of the 29th Annual Conference on Learning Theory, New York, NY, USA, 23–26 June 2016; Feldman, V., Rakhlin, A., Shamir, O., Eds.; Volume 49, pp. 257–283. [Google Scholar]
  47. Akhavan, A.; Pontil, M.; Tsybakov, A.B. Exploiting Higher Order Smoothness in Derivative-Free Optimization and Continuous Bandits, NIPS’20; Curran Associates Inc.: Red Hook, NY, USA, 2020. [Google Scholar]
  48. Lamboni, M. Optimal and efficient approximations of gradients of functions with nonindependent variables. Axioms 2024, 13, 426. [Google Scholar] [CrossRef]
  49. Patelli, E.; Pradlwarter, H. Monte Carlo gradient estimation in high dimensions. Int. J. Numer. Methods Eng. 2010, 81, 172–188. [Google Scholar] [CrossRef]
  50. Prashanth, L.; Bhatnagar, S.; Fu, M.; Marcus, S. Adaptive system optimization using random directions stochastic approximation. IEEE Trans. Autom. Control. 2016, 62, 2223–2238. [Google Scholar]
  51. Agarwal, N.; Bullins, B.; Hazan, E. Second-order stochastic optimization for machine learning in linear time. J. Mach. Learn. Res. 2017, 18, 4148–4187. [Google Scholar]
  52. Zhu, J.; Wang, L.; Spall, J.C. Efficient implementation of second-order stochastic approximation algorithms in high-dimensional problems. IEEE Trans. Neural Netw. Learn. Syst. 2020, 31, 3087–3099. [Google Scholar] [CrossRef] [PubMed]
  53. Zhu, J. Hessian estimation via stein’s identity in black-box problems. In Proceedings of the 2nd Mathematical and Scientific Machine Learning Conference, Online, 15–17 August 2022; Bruna, J., Hesthaven, J., Zdeborova, L., Eds.; Volume 145 of Proceedings of Machine Learning Research, PMLR. pp. 1161–1178. [Google Scholar]
  54. Erdogdu, M.A. Newton-stein method: A second order method for glms via stein’ s lemma. In Advances in Neural Information Processing Systems; Cortes, C., Lawrence, N., Lee, D., Sugiyama, M., Garnett, R., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2015; Volume 28. [Google Scholar]
  55. Stein, C.; Diaconis, P.; Holmes, S.; Reinert, G. Use of exchangeable pairs in the analysis of simulations. Lect.-Notes-Monogr. Ser. 2004, 46, 1–26. [Google Scholar]
  56. Zemanian, A. Distribution Theory and Transform Analysis: An Introduction to Generalized Functions, with Applications, Dover Books on Advanced Mathematics; Dover Publications: Mineola, NY, USA, 1987. [Google Scholar]
  57. Strichartz, R. A Guide to Distribution Theory and Fourier Transforms, Studies in Advanced Mathematics; CRC Press: Boca, FL, USA, 1994. [Google Scholar]
  58. Rawashdeh, E. A simple method for finding the inverse matrix of Vandermonde matrix. Math. Vesn. 2019, 71, 207–213. [Google Scholar]
  59. Arafat, A.; El-Mikkawy, M. A fast novel recursive algorithm for computing the inverse of a generalized Vandermonde matrix. Axioms 2023, 12, 27. [Google Scholar] [CrossRef]
  60. Morris, M. Factorial sampling plans for preliminary computational experiments. Technometrics 1991, 33, 161–174. [Google Scholar] [CrossRef]
  61. Roustant, O.; Barthe, F.; Iooss, B. Poincaré inequalities on intervals-application to sensitivity analysis. Electron. J. Stat. 2017, 11, 3081–3119. [Google Scholar] [CrossRef]
  62. Lamboni, M. Multivariate sensitivity analysis: Minimum variance unbiased estimators of the first-order and total-effect covariance matrices. Reliab. Eng. Syst. Saf. 2019, 187, 67–92. [Google Scholar] [CrossRef]
  63. Homma, T.; Saltelli, A. Importance measures in global sensitivity analysis of nonlinear models. Reliab. Eng. Syst. Saf. 1996, 52, 1–17. [Google Scholar] [CrossRef]
  64. Dutang, C.; Savicky, P. R Package, version 1.13. Randtoolbox: Generating and Testing Random Numbers. The R Foundation: Vienna, Austria, 2013. [Google Scholar]
Figure 1. Average of d = 3 mean squared errors using the Ishigami function (○ the direct estimator (16) and + for the plug-in estimator (18)).
Figure 1. Average of d = 3 mean squared errors using the Ishigami function (○ the direct estimator (16) and + for the plug-in estimator (18)).
Stats 07 00042 g001
Figure 2. Average of d = 10 mean squared errors using the g-function of type A (○ the direct estimator (16) and + for the plug-in estimator (18)).
Figure 2. Average of d = 10 mean squared errors using the g-function of type A (○ the direct estimator (16) and + for the plug-in estimator (18)).
Stats 07 00042 g002
Figure 3. Average of d = 10 mean squared errors using the g-function of type B (○ the direct estimator (16) and + for the plug-in estimator (18)).
Figure 3. Average of d = 10 mean squared errors using the g-function of type B (○ the direct estimator (16) and + for the plug-in estimator (18)).
Stats 07 00042 g003
Figure 4. Average of d = 10 mean squared errors using the g-function of type C (○ the direct estimator (16) and + for the plug-in estimator (18)).
Figure 4. Average of d = 10 mean squared errors using the g-function of type C (○ the direct estimator (16) and + for the plug-in estimator (18)).
Stats 07 00042 g004
Figure 5. Average of d = 3 mean squared gaps using the Ishigami function (○ the direct estimator (17) and + for the plug-in estimator (19)).
Figure 5. Average of d = 3 mean squared gaps using the Ishigami function (○ the direct estimator (17) and + for the plug-in estimator (19)).
Stats 07 00042 g005
Figure 6. Average of d = 10 mean squared gaps using the g-function of type A (○ the direct estimator (17) and + for the plug-in estimator (19)).
Figure 6. Average of d = 10 mean squared gaps using the g-function of type A (○ the direct estimator (17) and + for the plug-in estimator (19)).
Stats 07 00042 g006
Figure 7. Average of d = 10 mean squared gaps using the g-function of type B (○ the direct estimator (17) and + for the plug-in estimator (19)).
Figure 7. Average of d = 10 mean squared gaps using the g-function of type B (○ the direct estimator (17) and + for the plug-in estimator (19)).
Stats 07 00042 g007
Figure 8. Average of d = 10 mean squared gaps using the g-function of type C (○ the direct estimator (17) and + for the plug-in estimator (19)).
Figure 8. Average of d = 10 mean squared gaps using the g-function of type C (○ the direct estimator (17) and + for the plug-in estimator (19)).
Stats 07 00042 g008
Figure 9. Predictions of g-function of type B using the emulator f s = 1 ^ versus observations using L = 2 and L = 3 .
Figure 9. Predictions of g-function of type B using the emulator f s = 1 ^ versus observations using L = 2 and L = 3 .
Stats 07 00042 g009
Table 1. Average of 30 estimates of the main indices and upper bounds of total indices for the Ishigami function using the plug-in estimators, L = 1 , and 2000 model runs.
Table 1. Average of 30 estimates of the main indices and upper bounds of total indices for the Ishigami function using the plug-in estimators, L = 1 , and 2000 model runs.
X1X2X3
S j 0.2490.318−0.006
U B j 1.4204.8720.711
Table 2. Average of 30 estimates of the main indices and upper bounds of total indices for the g-functions using the plug-in estimators, L = 1 , and 2000 model runs.
Table 2. Average of 30 estimates of the main indices and upper bounds of total indices for the g-functions using the plug-in estimators, L = 1 , and 2000 model runs.
X1X2X3X4X5X6X7X8X9X10
Type A
S j 0.3300.3240.0060.0050.0060.0060.0050.0050.0060.005
U B j 2.0222.0050.0450.0460.0470.0470.0460.0460.0460.047
Type B
S j 0.0850.0850.0850.0850.0850.0850.0850.0850.0850.085
U B j 0.3620.3630.3630.3620.3630.3620.3630.3620.3630.363
Type C
S j 0.0280.0280.0320.0320.0350.0410.0310.0300.0360.034
U B j 2.0331.3011.8251.6051.6341.6412.2161.5261.7931.503
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Lamboni, M. Optimal Estimators of Cross-Partial Derivatives and Surrogates of Functions. Stats 2024, 7, 697-718. https://doi.org/10.3390/stats7030042

AMA Style

Lamboni M. Optimal Estimators of Cross-Partial Derivatives and Surrogates of Functions. Stats. 2024; 7(3):697-718. https://doi.org/10.3390/stats7030042

Chicago/Turabian Style

Lamboni, Matieyendou. 2024. "Optimal Estimators of Cross-Partial Derivatives and Surrogates of Functions" Stats 7, no. 3: 697-718. https://doi.org/10.3390/stats7030042

Article Metrics

Back to TopTop