Cutting-Edge Stochastic Approach: Efficient Monte Carlo Algorithms with Applications to Sensitivity Analysis

Dimov, Ivan; Georgieva, Rayna

doi:10.3390/a18050252

Open AccessArticle

Cutting-Edge Stochastic Approach: Efficient Monte Carlo Algorithms with Applications to Sensitivity Analysis

by

Ivan Dimov

^*,†

and

Rayna Georgieva

^*,†

Institute of Information and Communication Technologies, Bulgarian Academy of Sciences, IICT-BAS, Acad. G. Bonchev 25A, 1113 Sofia, Bulgaria

^*

Authors to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Algorithms 2025, 18(5), 252; https://doi.org/10.3390/a18050252

Submission received: 19 March 2025 / Revised: 14 April 2025 / Accepted: 18 April 2025 / Published: 27 April 2025

(This article belongs to the Section Algorithms for Multidisciplinary Applications)

Download Versions Notes

Abstract

:

Many important practical problems connected to energy efficiency in buildings, ecology, metallurgy, the development of wireless communication systems, the optimization of radar technology, quantum computing, pharmacology, and seismology are described by large-scale mathematical models that are typically represented by systems of partial differential equations. Such systems often involve numerous input parameters. It is crucial to understand how susceptible the solutions are to uncontrolled variations or uncertainties within these input parameters. This knowledge helps in identifying critical factors that significantly influence the model’s outcomes and can guide efforts to improve the accuracy and reliability of predictions. Sensitivity analysis (SA) is a method used efficiently to assess the sensitivity of the output results from large-scale mathematical models to uncertainties in their input data. By performing SA, we can better manage risks associated with uncertain inputs and make more informed decisions based on the model’s outputs. In recent years, researchers have developed advanced algorithms based on the analysis of variance (ANOVA) technique for computing numerical sensitivity indicators. These methods have also incorporated computationally efficient Monte Carlo integration techniques. This paper presents a comprehensive theoretical and experimental investigation of Monte Carlo algorithms based on “symmetrized shaking” of Sobol’s quasi-random sequences. The theoretical proof demonstrates that these algorithms exhibit an optimal rate of convergence for functions with continuous and bounded first derivatives and for functions with continuous and bounded second derivatives, respectively, both in terms of probability and mean square error. For the purposes of numerical study, these approaches were successfully applied to a particular problem. A specialized software tool for the global sensitivity analysis of an air pollution mathematical model was developed. Sensitivity analyses were conducted regarding some important air pollutant levels, calculated using a large-scale mathematical model describing the long-distance transport of air pollutants—the Unified Danish Eulerian Model (UNI-DEM). The sensitivity of the model was explored focusing on two distinct categories of key input parameters: chemical reaction rates and input emissions. To validate the theoretical findings and study the applicability of the algorithms across diverse problem classes, extensive numerical experiments were conducted to calculate the main sensitivity indicators—Sobol’ global sensitivity indices. Various numerical integration algorithms were employed to meet this goal—Monte Carlo, quasi-Monte Carlo (QMC), scrambled quasi-Monte Carlo methods based on Sobol’s sequences, and a sensitivity analysis approach implemented in the SIMLAB software for sensitivity analysis. During the study, an essential task arose that is small in value sensitivity measures. It required numerical integration approaches with higher accuracy to ensure reliable predictions based on a specific mathematical model, defining a vital role for small sensitivity measures. Both the analysis and numerical results highlight the advantages of one of the proposed approaches in terms of accuracy and efficiency, particularly for relatively small sensitivity indices.

Keywords:

variance-based sensitivity analysis; multidimensional numerical integration; Monte Carlo and quasi-Monte Carlo algorithms; air pollution modeling

1. Introduction: Problem Setting

It is well known that environmental security is a very important topic for modern society. The development of reliable and sustainable mathematical models has a significant role at this area. Identifying the most influential factors, such as chemical reaction rates, boundary conditions, and emission levels, through sensitivity analysis techniques provides invaluable insights for enhancing the model’s performance. This process not only aids in pinpointing areas requiring more detailed scrutiny but also guides improvements in the overall model structure and parametrization. By understanding which factors have the greatest impact on model outputs, researchers can prioritize efforts to refine these critical components, leading to more accurate and reliable predictions. This will lead to increased reliability and robustness of predictions obtained from large-scale environmental and climate models.

The input data for sensitivity analyses were obtained during runs of a large-scale mathematical model for the remote transport of air pollutants. The Unified Danish Eulerian Model (UNI-DEM) was developed at the Danish National Environmental Research Institute (http://www2.dmu.dk/AtmosphericEnvironment/DEM/, 1 April 2015) [1].

A mathematical model depicts an initial problem, frequently expressed through partial differential equations (PDEs) or analogous mathematical structures. After establishment, this model undergoes discretization via methods like finite elements, finite differences, or boundary elements. Significantly, post-discretization, some extensive mathematical models can comprise billions or even trillions of algebraic equations. Resolving these models can require multiple days, even with the use of high-performance computing systems. Hence, the creation of efficient algorithms becomes essential.

Comprehensive models depict both critical and less significant phenomena concerning the primary output of the model. Despite equal treatment, it can transpire that addressing a minor phenomenon demands substantial computational resources yet contributes marginally to the solution. An illustration of this situation is provided in this study.

Moreover, the model’s output might exhibit high sensitivity to a singular measurable input parameter. Possessing prior knowledge regarding the importance of certain parameters can prove immensely advantageous. For example, understanding that a particular parameter substantially influences the model’s result implies that it should be measured with increased accuracy. This might necessitate resource allocation for acquiring more precise measuring devices, consequently elevating the overall quality and dependability of the model’s forecasts.

Let us assume the mathematical model at hand can be described using a function

u = f (x), where x = (x_{1}, x_{2}, \dots, x_{d}) \in U^{d} \equiv {[0; 1]}^{d}

(1)

which is the vector of input parameters with a joint probability density function (PDF)

p (x) = p (x_{1}, \dots, x_{d})

. Let us assume that the input variables are independent (non-correlated) and the density function

p (x)

is known, even if

x_{i}

are not actually random variables. This implies that the output

u

is also a random variable as it is a function of the random vector

x

with its own PDF. The above presentation is quite general. One must consider that, in most cases of large-scale modeling, the function

f (x)

is not explicitly available. Frequently, the function

f (x)

represents a solution to a system of partial differential equations with specified boundary and initial conditions. These initial conditions are crucial for ensuring the existence of a unique solution to the system. When dealing with large-scale problems involving multiple parameters in differential equations, demonstrating the existence of a unique solution becomes more difficult. In such cases, it is common to assume the existence of a unique solution and proceed by discretizing the problem using a suitable numerical method.

Global sensitivity analysis (GSA) offers significant benefits for both model developers and users, enabling them to quantify uncertainties in model outputs, assess the relative contributions of various input factors to these uncertainties, and prioritize efforts for their reduction. There are several critical stages involved in conducting a reliable and efficient sensitivity analysis. One of the first steps is metamodeling or approximation, which serves as a vital connection between generating experimental data and applying mathematical techniques for sensitivity analysis. Accurate approximation of data is essential for the overall reliability of the resulting sensitivity indices; therefore, identifying an effective method for approximating discrete functions is crucial. Detailed information on the approximation stage of the mathematical model under consideration can be found in [2].

The subsequent important step in performing sensitivity analysis involves selecting an appropriate technique (e.g., local approaches like one-at-a-time experiments, screening methods, variance-based methods, or derivative-based global sensitivity approaches [3]). This generally requires robust sampling techniques, such as Latin hypercube sampling, importance sampling, stratified sampling, and low-discrepancy sequences.

A strong contender for reliably analyzing models with nonlinearities is the variance-based approach [4]. Its core principle is to estimate how variations in an input parameter or group of inputs contribute to the variance of the model output. To measure this, the total sensitivity index (TSI) is employed (see Section 1.5). We consider this measure more suitable and reliable than others when multi-component analysis is required. Mathematically, it is defined as a multidimensional integral:

I = \int_{Ω} g (x) p (x) d x, Ω \subset R^{d},

(2)

where

g (x)

is a square-integrable function in

Ω

and

p (x) \geq 0

is a probability density function such that

\int_{Ω} p (x) d x = 1

. The advantages and reasons to choose a variance-based approach for SA in the current work are provided in Section 1.4. A detailed description of the Sobol’ approach (a well-known and efficient variance-based approach for SA) is presented in Section 1.5. Following Equation (2), one can conclude that the last crucial step of providing reliable SA is to choose an efficient Monte Carlo (stochastic) algorithm for multidimensional integration to compute sensitivity indices of inputs in a reliable way.

From an environmental security perspective, it is crucial to examine the impact of variations in chemical reaction rates and emission levels on the results generated by the UNI-DEM. Conducting such an analysis can yield valuable insights that serve as a solid foundation for making well-informed assumptions, reasonable simplifications, or identifying parameters whose accuracy needs enhancement. This is because the model’s outputs can be highly sensitive to fluctuations in these parameters.

The primary objective of our research is to enhance the reliability of the model’s outputs by employing efficient Monte Carlo algorithms (MCAs) that are optimal for a specific category of integrands. In certain scenarios, an issue termed “loss of accuracy” arises when attempting to compute small sensitivity indices. This study introduces and evaluates specialized Monte Carlo techniques designed to mitigate these challenges, thereby improving the overall accuracy and dependability of the model’s results.

The main tasks of the present work are to

Study the sensitivity of the concentration levels of key pollutants (e.g., $O_{3}$ ) due to variations in chemical rates and emission levels. Apply unimprovable Monte Carlo algorithms for numerical integration to perform Sobol’ variance-based sensitivity analysis.
Compare the developed Monte Carlo algorithms based on Sobol’ quasi-random points [2] with four existing approaches for multidimensional integration:
–
Sobol’ approach carried out by a Gray code implementation and sets of direction numbers proposed by Joe and Kuo [5];
–
Owen’s scrambling [6] taken from the collection of NAG C Library [7];
–
eFAST approach carried out via SIMLAB [8];
–
Plain Monte Carlo algorithm [9].
Demonstrate the superior efficiency of the algorithms proposed by the authors in the current study;
Provide practical insights into the case study at hand;
Offer operational guidelines for estimating relatively small Sobol’ indices in presence of computational difficulties.

1.1. Computational Complexity of Algorithms

The selection of the most efficient numerical solution algorithm for solving large-scale problems is indeed crucial. Efficiency in this context typically refers to an algorithm’s ability to achieve a desired level of accuracy while minimizing computational complexity. Computational complexity is often measured by the number of arithmetic operations required to reach a specified accuracy level.

Here are some key points to consider when choosing the most efficient algorithm:

Accuracy: The algorithm should be able to provide solutions within a predefined acceptable error margin. This could be a fixed error for deterministic algorithms or a probabilistic error bound for stochastic algorithms.
Computational Complexity: Algorithms can be classified based on their time and space complexity. Time complexity measures how the execution time grows as the input size increases, while space complexity measures how much memory is used. Lower complexities generally indicate better efficiency.
Deterministic vs. Stochastic Algorithms: Deterministic algorithms produce the same output for the same input every time they run. Stochastic algorithms introduce randomness and might produce different outputs for the same input but can offer advantages like faster convergence or simpler implementations.
Arithmetic Operations: Counting the number of arithmetic operations provides a good estimate of an algorithm’s computational complexity. Fewer arithmetic operations usually mean less computation time and lower resource usage.
Memory Usage: In addition to arithmetic operations, the amount of memory required by an algorithm can also impact its efficiency. Some algorithms require significant amounts of memory, which can limit their applicability regarding systems with limited resources.
Communication Costs: For distributed computing environments, communication costs between nodes can significantly affect overall performance. Efficient algorithms minimize these costs.
Hybrid Approaches: Sometimes, combining multiple algorithms or using hybrid methods can lead to improved efficiency. For example, using a fast but approximate method initially followed by a slower but more accurate method can balance speed and precision.

In summary, the most efficient numerical solution algorithm is the one that balances accuracy, computational complexity, memory usage, and other relevant factors to best suit the specific requirements of the problem at hand. It is important to note that there is no universally “best” algorithm; the optimal choice depends heavily on the specifics of the problem being solved and the available computational resources.

One could consider comparing two classes of algorithms: deterministic algorithms and randomized (Monte Carlo) algorithms. Let I be the desired value of the integral. Assume that, for a given random variable

θ

, the mathematical expectation satisfies

E θ = I

. Suppose that the mean value of n values of

θ

:

θ^{(i)}, i = 1, \dots, n

is considered as a Monte Carlo approximation to the solution:

{\bar{θ}}_{n} = 1 / n \sum_{i = 1}^{n} θ^{(i)} \approx I,

where

θ^{(i)} (i = 1, 2, \dots, n)

correspond to values (realizations) of an RV

θ

. Generally, a randomized algorithm may produce results with a given probability of error. When working with randomized algorithms, one must acknowledge that the computational result will be correct only with a certain (although high) probability. In most practical computations, it is acceptable to tolerate an error estimate with a probability less than 1.

Consider the following integration problem:

S (f) : = I = \int_{U^{d}} f (x) d x,

(3)

where

x \equiv (x_{1}, \dots, x_{d}) \in U^{d} \subset R^{d}

and

f \in C (U^{d})

is an integrable function on

U^{d}

. The computational problem can be considered as a mapping of function

f : {{[0, 1]}^{d} \to R}

to

R

:

S (f) : f \to R,

where

S (f) = \int_{U^{d}} f (x) d x

and

f \in F_{0} \subset C (U^{d})

. We denote by S the solution operator. The elements of

F_{0}

constitute the input data for which the problem must be solved; specifically, for any

f \in F_{0}, S (f)

.

In certain instances, there is interest in cases where the integrand f possesses higher regularity. This frequently occurs in practical computations, where f tends to be smooth and have bounded high-order derivatives. Under these circumstances, it is advantageous to exploit this smoothness. To accomplish this, we introduce the functional class

F_{0}

, defined as

F_{0} \equiv W^{k} (∥ f ∥; U^{d}) .

Here,

U^{d}

denotes the domain of interest, and

W^{k} (∥ f ∥; U^{d})

refers to the Sobolev space of functions on

U^{d}

with up to k-th order weak derivatives. The notation

{∥ f ∥}_{U^{d}}

symbolizes the norm associated with the Sobolev space

W^{k}

. By defining

F_{0}

in this fashion, we can effectively capture the smoothness characteristics of the function f, thereby potentially enhancing the precision and computational efficiency of numerical methods designed for such smooth integrands.

Definition 1.

Let d and k be integers,

d, k \geq 1

. We consider the class

W^{k} (∥ f ∥; U^{d})

(sometimes abbreviated to

W^{k}

) of real functions f defined over the unit cube

U^{d} = {[0, 1)}^{d}

, possessing all the partial derivatives

\frac{\partial^{r} f (x)}{\partial x_{1}^{α_{1}} \dots \partial x_{d}^{α_{d}}}, α_{1} + \dots + α_{d} = r \leq k,

which are continuous when

r < k

and bounded in sup norm when

r = k .

The seminorm

∥\cdot∥

on

W^{k}

is defined as

∥f∥ = sup \{|\frac{\partial^{k} f (x)}{\partial x_{1}^{α_{1}} \dots \partial x_{d}^{α_{d}}}|, α_{1} + \dots + α_{d} = k, x \equiv (x_{1}, \dots, x_{d}) \in U^{d}\} .

We keep the seminorm

∥ f ∥

in the notation for the functional class

W^{k} (∥ f ∥; U^{d})

since it is important for our further consideration. We call a quadrature formula any expression of the form

A^{D} (f, n) = \sum_{i = 1}^{n} c_{i} f (x^{(i)}),

which approximates the value of the integral

S (f)

. The real numbers

c_{i} \in R

are called weights and the d dimensional points

x^{(i)} \in U^{d}

are called nodes. It is clear that, for fixed weights

c_{i}

and nodes

x^{(i)} \equiv (x_{i, 1}, \dots, x_{i, d})

, the quadrature formula

A^{D} (f, n)

may be used to define an algorithm with an integration error

e r r (f, A^{D}) \equiv \int_{U^{d}} f (x) d x - A^{D} (f, n)

. We call a randomized quadrature formula any formula of the following kind:

A^{R} (f, n) = \sum_{i = 1}^{n} σ_{i} f (ξ^{(i)}),

where

σ_{i}

and

ξ^{(i)}

are random weights and nodes, respectively. The algorithm

A^{R} (f, n)

belongs to the randomized class (Monte Carlo) denoted by

A^{R}

.

Definition 2.

Given a randomized (Monte Carlo) integration formula for the functions from the space

W^{k}

, we define the integration error

e r r (f, A^{R}) \equiv \int_{U^{d}} f (x) d x - A^{R} (f, n)

by the probability error

ε_{P} (f)

in the sense that

ε_{P} (f)

is the least possible real number, such that

P r (|e r r (f, A^{R})| < ε_{P} (f)) \geq P,

and the mean square error

r (f) = {\{E [e r r^{2} (f, A^{R})]\}}^{1 / 2} .

We assume that it suffices to obtain an

ε_{P} (f)

-approximation to the solution with a probability

0 < P < 1

. If we allow equality, i.e.,

0 < P \leq 1

in Definition 2, then

ε_{P} (f)

can be used as an accuracy measure for both randomized and deterministic algorithms. In such a way, it is consistent to consider a wider class

A

of algorithms that encompass both classes: deterministic and randomized algorithms.

Definition 3.

Consider the set

A

of algorithms A:

A = {A : P r (| e r r (f, A) | \leq ε) \geq c}, A \in {A^{D}, A^{R}}, 0 < c < 1

that solve a given problem with an integration error

e r r (f, A)

.

In such a setting, it is correct to compare randomized algorithms with algorithms based on low-discrepancy sequences like Sobol’

Λ Π_{τ}

-sequences.

1.2. Sobol’ Sequences

Λ Π_{τ}

-sequences are uniformly distributed sequences (UDSs) The term UDS was introduced by Hermann Weyl in 1916 [10].

For practical applications, it is essential to identify a uniformly distributed sequence (UDS) that satisfies three key requirements [11,12]:

1.: Best Asymptotic Behavior: As n approaches infinity, the sequence should exhibit optimal asymptotic behavior.
2.: Well-Distributed Points: For smaller values of n, the points should be evenly distributed across the domain.
3.: Computational Efficiency: The algorithm used to generate the sequence should be computationally inexpensive.

By ensuring that these criteria are met, we can guarantee the effectiveness and practicality of the chosen UDSs in various computational tasks.

All

Λ Π_{τ}

-sequences given in [12] satisfy the first requirement. Suitable distributions such as

Λ Π_{τ}

-sequences are also called

(t, m, s)

-nets and

(t, s)

-sequences in base

b \geq 2

. To introduce them, define first an elementary s-interval in base b as a subset of

U^{s}

of the form

E = \prod_{j = 1}^{s} [\frac{a_{j}}{b^{d_{j}}}, \frac{a_{j} + 1}{b^{d_{j}}}],

where

a_{j}, d_{j} \geq 0

are integers and

a_{j} < b^{d_{j}}

for all

j \in {1, \dots, s}

. Given two integers

0 \leq t \leq m

, a

(t, m, s)

-net in base b is a sequence

x^{(i)}

of

b^{m}

points of

U^{s}

such that

C a r d E \cap {x^{(1)}, \dots, x^{(b^{m})}} = b^{t}

for any elementary interval E in base b of hypervolume

λ (E) = b^{t - m}

. Given a non-negative integer t, a

(t, s)

-sequence in base b is an infinite sequence of points

x^{(i)}

such that, for all integers

k \geq 0, m \geq t

, the sequence

{x^{(k b^{m})}, \dots, x^{((k + 1) b^{m} - 1)}}

is a

(t, m, s)

-net in base b.

I. M. Sobol’ [11] defines his

Π_{τ}

-meshes and

Λ Π_{τ}

-sequences, which are

(t, m, s)

-nets and

(t, s)

-sequences in base 2, respectively. The terms

(t, m, s)

-nets and

(t, s)

-sequences in base b (also called Niederreiter sequences) were introduced in 1988 by H. Niederreiter [13].

To generate the j-th component of the points in a Sobol’ sequence, we need to choose a primitive polynomial of some degree

s_{j}

over the Galois field of two elements GF(2)

P_{j} = x^{s_{j}} + a_{1, j} x^{s_{j} - 1} + a_{2, j} x^{s_{j} - 2} + \dots + a_{s_{j} - 1, j} x + 1,

where the coefficients

a_{1, j}, \dots, a_{s_{j} - 1, j}

are either 0 or 1. GF(2) is the unique field with two elements

{0; 1}

where the addition is defined equivalently to the logical XOR operation and the multiplication to the logical AND operation, respectively. A sequence of positive integers

{m_{1, j}, m_{2, j}, \dots}

are defined by the recurrence relation

m_{k, j} = 2 a_{1, j} m_{k - 1, j} \oplus 2^{2} a_{2, j} m_{k - 2, j} \oplus \dots \oplus 2^{s_{j}} m_{k - s_{j}, j} \oplus m_{k - s_{j}, j},

where ⊕ is the bit-by-bit exclusive-or operator. The values

m_{1, j}, \dots, m_{s_{j}, j}

can be chosen freely provided that each

m_{k, j}, 1 \leq k \leq s_{j}

is odd and less than

2^{k}

. Therefore, it is possible to construct different Sobol’ sequences for the fixed dimension s. In practice, these numbers must be chosen very carefully to obtain very efficient Sobol’ sequence generators [14]. The so-called direction numbers

{v_{1, j}, v_{2, j}, \dots}

are defined by

v_{k, j} = \frac{m_{k, j}}{2^{k}}

. Then, the j-th component of the i-th point in a Sobol’ sequence is given by

x_{i, j} = i_{1} v_{1, j} \oplus i_{2} v_{2, j} \oplus \dots,

where

i_{k}

is the k-th binary digit of

i = {(\dots i_{3} i_{2} i_{1})}_{2}

.

Subroutines to compute these points can be found in [15,16]. The work [17] contains more details.

1.3. Randomized Quasi-Monte Carlo (RQMC)

Instead of employing randomized (Monte Carlo) algorithms for computing the mentioned sensitivity parameters, one can consider deterministic quasi-Monte Carlo algorithms or randomized quasi-Monte Carlo methods [18,19]. Randomized (Monte Carlo) algorithms have proven highly efficient in solving multidimensional integrals over composite domains [9,11]. Simultaneously, quasi-Monte Carlo (QMC) methods based on well-distributed Sobol’ sequences present a compelling alternative to Monte Carlo algorithms, particularly for smooth integrands and relatively low effective dimensions (up to

d = 15

) [20,21,22].

Sobol’

Λ Π_{τ}

-sequences are strong contenders for efficient QMC algorithms. Although these algorithms are deterministic, they emulate the pseudo-random sequences used in Monte Carlo integration. A significant limitation of

Λ Π_{τ}

-sequences is their potential for suboptimal two-dimensional projections, implying that the distribution of points can deviate substantially from uniformity. If the computational problem involves such projections, the lack of uniformity could result in a considerable loss of accuracy. To mitigate this issue, randomized QMC methods can be employed. Several randomization techniques exist, with scrambling being a notable example. The primary motivation behind scrambling [6,23] was to enhance the uniformity of quasi-random sequences in high dimensions, as assessed through two-dimensional projections. Moreover, scrambling offers a simple and unified way to generate quasi-random numbers for parallel, distributed, and grid-based computational environments. Essentially, scrambled algorithms can be regarded as Monte Carlo algorithms with a specific choice in the density function. Thus, it is logical to compare two classes of algorithms: deterministic and randomized.

Various versions of scrambling methods exist, based on digital permutations, and their differences lie in the definitions of the permutation functions. Examples include Owen’s nested scrambling [6,24], Tezuka’s generalized Faure sequences [25], and Matousek’s linear scrambling [26]. Following the introduction of Niederreiter sequences [13], Owen [6] and Tezuka [25] independently developed two influential scrambling methods for

(t, s)

-sequences in 1994. Owen specifically highlighted that scrambling can be employed to generate error estimates for quasi-Monte Carlo (QMC) methods. Numerous other techniques for scrambling

(t, s)

-sequences have since been proposed, many of which represent modifications or simplifications of the original Owen and Tezuka schemes. Owen’s method is particularly effective for

(t, s)

-sequences, whereas the Tezuka algorithm was shown to be efficient for

(0, s)

-sequences. Most existing scrambling methods involve randomizing a single digit at a time. In contrast, the approach presented in [27] randomizes multiple digits within a single point simultaneously, offering enhanced efficiency when utilizing standard pseudo-random number generators as scramblers.

Owen’s scrambling [6], also known as nested scrambling, was developed to provide a practical error estimate for QMC by treating each scrambled sequence as a separate and independent random sample from a family of randomly scrambled quasi-random numbers. Let

x^{(i)} = (x_{i, 1}, x_{i, 2}, \dots, x_{i, s}), i = 1, \dots, n

be quasi-random numbers in

{[0, 1)}^{s}

, and let

z^{(i)} = (z_{i, 1}, z_{i, 2}, \dots, z_{i, s})

be the scrambled version of the point

x^{(i)}

. Suppose that each

x_{i, j}

can be represented in base b as

x_{i, j} = {(0 . x_{i 1, j} x_{i 2, j} \dots x_{i K, j} \dots)}_{b}

, with K being the number of digits to be scrambled. Then, nested scrambling proposed by Owen [6,24] can be defined as follows:

z_{i 1, j} = π_{•} (x_{i 1, j})

and

z_{i l, j} = π_{• x_{i 1, j} x_{i 2, j} \dots x_{i l - 1, j}} (x_{i l, j})

, with independent permutations

π_{• x_{i 1, j} x_{i 2, j} \dots x_{i l - 1, j}}

for

l \geq 2

, where

π_{•}

means the permutation used for

x_{i 1, j}

. The permutation used for the next element depends on the value of

x_{i 1, j}

, and it is denoted by

π_{• x_{i 1, j}}

. The permutation applied for

x_{i l, j}

for

l \geq 2

depends on all the previous values, and it is denoted by

π_{• x_{i 1, j} x_{i 2, j} \dots x_{i l - 1, j}}

. Of course,

(t, m, s)

-net remains

(t, m, s)

-net under nested scrambling. However, nested scrambling requires

b^{l - 1}

permutations to scramble the l-th digit.

The rate for scrambled-net Monte Carlo is

n^{- 3 / 2} {(l o g n)}^{(s - 1) / 2}

in probability, while the rate for unscrambled nets is

n^{- 1} {(l o g n)}^{s - 1}

or

n^{- 1} {(l o g n)}^{s}

along

(t, s)

sequences [28]. The first rate is an average case result for a fixed function f taken over random permutations. Other findings pertain to the worst-case performance over functions given a fixed set of integration points. Since scrambled nets retain their net structure, these worst-case bounds are also applicable to them [28]. Some scrambling methods do not alter the asymptotic discrepancy of quasi-random sequences [6]. Despite improving the quality of quasi-random sequences, this enhancement is not immediately evident in the computation of the

L_{2}

discrepancy. Moreover, it is presently impossible to theoretically demonstrate that one scrambled quasi-random sequence outperforms another. While scrambling does not impact the theoretical bounds on the discrepancy of these sequences, it does enhance the measures of two-dimensional projections and the evaluation of high-dimensional integrals.

At its core, Owen’s nested scrambling [6] relies on the randomization of a single digit during each iteration, and it is a powerful technique applicable to all

(t, s)

-sequences. However, from an implementation perspective, nested scrambling—or what is often referred to as path-dependent permutations—requires significant bookkeeping and can lead to more complex implementations. On the other hand, it has been established that its convergence rate is

n^{- 3 / 2} {(l o g n)}^{(s - 1) / 2}

. While this rate is commendable, it is still not optimal, even when dealing with integrands in

W^{1} (L; U^{s})

. Removing the logarithmic term from the estimate would make the rate optimal. Nevertheless, it remains an open question whether this estimate is precise, i.e., whether the logarithm can indeed be eliminated. Importantly, the demonstrated convergence rate for Owen’s nested scrambling algorithm surpasses that of unscrambled nets, which stands at

n^{- 1} {(l o g n)}^{s - 1}

. Consequently, it becomes essential to conduct numerical comparisons between our algorithms and Owen’s nested scrambling.

1.4. Concept of Sensitivity Analysis

The process of sensitivity analysis (SA) is of crucial importance for large-scale mathematical models. This process includes the following three steps:

first, one should define the probability distributions for the input parameters under consideration;
second, samples should be generated according to the defined probability distributions using a proper sampling strategy;
third, an efficient approach is necessary for sensitivity analysis to be applied to study the output variance according to the variation in the inputs.

Moreover, there are additional stages during the process of providing sensitivity analysis to a particular mathematical model: (i) approximation, which is an important link between the generation of experimental data and the mathematical technology for sensitivity analysis, and (ii) use of a proper probability approach for computing specific sensitivity measures.

A variety of sensitivity analysis techniques are documented [29]. Most current methods for conducting SA hinge on particular assumptions about the model’s behavior (such as linearity, monotonicity, and additivity of the relationship between model inputs and outputs). These assumptions commonly apply to a wide array of mathematical models. Nonetheless, some models incorporate substantial nonlinearities and/or stiffness, thereby invalidating assumptions about linearity and additivity. This issue is particularly pronounced when working with nonlinear systems of partial differential equations. Our research focuses on the UNI-DEM, a mathematical model that simulates the transport of air pollutants and other substances across extensive geographical regions. The system of partial differential equations captures key physical phenomena like advection, diffusion, deposition, and both chemical and photochemical processes affecting the studied species. Additionally, emissions and fluctuating meteorological conditions are considered. Nonlinearity in the equations predominantly stems from modeling chemical reactions [1]. If the model outputs are sensitive to a particular process, it suggests that the process may require refinement or more accurate modeling. Our aim is to enhance the reliability of the model predictions and identify processes needing further scrutiny as well as input parameters that demand higher-precision measurements. Thorough sensitivity analysis is crucial for uncovering potential areas where model simplification can occur. As a result, the development and exploration of more precise and robust sensitivity analysis methodologies are essential.

Among quantitative methods, variance-based techniques are frequently employed [4]. A literature review based on high-impact-factor journals from Science and Nature (Thomson Reuters, Journal citation reports, April 2015) and all Elsevier journals using Scopus bibliometric search tools for publications from 2005 to 2014 is presented in [30]. This study yields the following conclusions:

There is a progressively increasing proportion of global sensitivity analysis approaches, although local techniques remain predominant.
Regression- and variance-based techniques are the most favored.
Medicine and chemistry are the leading scientific domains in applying global sensitivity analysis.

Two prominent variance-based methods were utilized: the Sobol’ approach and the extended Fourier Amplitude Sensitivity Test (eFAST). These methods were implemented using Monte Carlo algorithms (MCAs) or the SIMLAB software tool for sensitivity analysis [8]. In the eFAST method developed to estimate the total effects, the variance of the model output y (a d-dimensional integral) is expressed as a single-dimensional integral with respect to a scalar variable s. This transformation involves representing each input variable

x_{i}

as

x_{i} = G_{i} (s i n (ω_{i} s))

, where

G_{i}

represents a suitable set of transformations and

ω_{i}

are integer frequencies (refer to [31] for detailed information).

Sobol’ indices offer advantages over eFAST, particularly in computing higher interaction terms in an analogous way to main effects. In the Sobol’ measure, each effect (main or interaction) is determined by evaluating a multidimensional integral using the Monte Carlo method. The strength of the Sobol’ method lies in its capacity to calculate the total sensitivity index with only one Monte Carlo integral per factor.

The assessment of parameter importance can be investigated through numerical integration in the context of variance analysis (ANOVA). Several unbiased Monte Carlo estimators for global sensitivity indices were devised, leveraging ANOVA decomposition of the model function [32,33,34,35].

1.5. Sobol’ Approach

In variance-based sensitivity analyses, the primary indicator associated with a given normalized input parameter

x_{i}, i = 1, \dots, d

(normalized) is typically defined as

\frac{V_{x_{i}} [E (u | x_{i})]}{V u},

(4)

where

V_{x_{i}} [E (u | x_{i})]

represents the variance of the conditional expectation of

u

given

x_{i}

and

V u

denotes the total variance of

u

. This indicator is termed the first-order sensitivity index by Sobol’ [36] or correlation ratio by McKay [37]. The total sensitivity index [38] quantifies the overall effect of a given parameter, encompassing all possible coupling terms between that parameter and all others.

The total sensitivity index (TSI) of an input parameter

x_{i}, i \in {1, \dots, d}

is defined in the following way [36,38,39]:

S_{i}^{t o t} = S_{i} + \sum_{l_{1} \neq i} S_{i l_{1}} + \sum_{l_{1}, l_{2} \neq i, l_{1} < l_{2}} S_{i l_{1} l_{2}} + \dots + S_{i l_{1} \dots l_{d - 1}},

(5)

where

S_{i}

is called the main effect (first-order sensitivity index) of

x_{i}

and

S_{i l_{1} \dots l_{j - 1}}

is the

j^{- t h}

order sensitivity index. Higher-order terms capture the interaction effects between the unknown input parameters

x_{i_{1}}, \dots, x_{i_{ν}}

, where

ν \in {2, \dots, d}

on the output variance. Later on, we show how sensitivity indices

S_{l_{1} \dots l_{ν}}

are defined via the variances of conditional expectations

V_{l_{1}}, V_{l_{1} \dots l_{ν}}, 2 \leq ν \leq d

.

The global SA method employed in this work is grounded in decomposing an integrable model function f in the d-dimensional factor space into terms of escalating dimensionality [36]:

f (x) = f_{0} + \sum_{ν = 1}^{d} \sum_{l_{1} < \dots < l_{ν}} f_{l_{1} \dots l_{ν}} (x_{l_{1}}, x_{l_{2}}, \dots, x_{l_{ν}}),

(6)

where

f_{0}

is a constant. The total number of summands in Equation (6) is

2^{d}

. The representation Equation (6) is referred to as the ANOVA representation of the model function

f (x)

if each term is chosen to satisfy the following condition [33]:

\int_{0}^{1} f_{l_{1} \dots l_{ν}} (x_{l_{1}}, x_{l_{2}}, \dots, x_{l_{ν}}) d x_{l_{k}} = 0, 1 \leq k \leq ν, ν = 1, \dots, d .

An important comment here is that, if the entire presentation Equation (6) of the right-hand side is used, then it does not simplify the initial problem. The main expectation is that a truncated sequence

\sum_{ν = 1}^{d_{t r}} \sum_{l_{1} < \dots < l_{ν}} f_{l_{1} \dots l_{ν}} (x_{l_{1}}, x_{l_{2}}, \dots, x_{l_{ν}}),

where

d_{t r} < d

, is considered as a sufficiently good approximation to the model function f. Then,

V = \int_{U^{d}} f^{2} (x) d x - f_{0}^{2}, V_{l_{1} \dots l_{ν}} = \int f_{l_{1} \dots l_{ν}}^{2} d x_{l_{1}} \dots d x_{l_{ν}}

(7)

The quantities represent the total and partial variances, respectively. They are obtained by squaring and integrating over

U^{d}

Equation (6), assuming that

f (x)

is a square-integrable function (ensuring that all terms in Equation (6) are also square-integrable functions). Consequently, the total variance of the model output is partitioned into partial variances in a manner analogous to the model function, resulting in a unique ANOVA decomposition:

V = \sum_{ν = 1}^{d} \sum_{l_{1} < \dots < l_{ν}} V_{l_{1} \dots l_{ν}} .

The use of probability theory concepts is based on the assumption that the input parameters are random variables distributed in

U^{d}

, which defines

f_{l_{1} \dots l_{ν}} (x_{l_{1}}, x_{l_{2}}, \dots, x_{l_{ν}})

, and also as random variables with variances regarding Equation (7). For example,

f_{l_{1}}

is presented by a conditional expectation:

f_{l_{1}} (x_{l_{1}}) = E (u | x_{l_{1}}) - f_{0} and respectively V_{l_{1}} = V [f_{l_{1}} (x_{l_{1}})] = V [E (u | x_{l_{1}})] .

Based on the above assumptions about the model function and the output variance, the following quantities

S_{l_{1} \dots l_{ν}} = \frac{V_{l_{1} \dots l_{ν}}}{V}, ν \in {1, \dots, d}

(8)

are referred to as the global sensitivity indices [33]. For

ν = 1

, this formula coincides with Equation (4), and the so-defined measures correspond to the main effect of input parameters as well as the effect of the cross-interactions.

Based on Formulas (7) and (8), it becomes apparent that addressing the problem of global sensitivity analysis mathematically involves calculating total sensitivity indices (Equation (5)) of the relevant order. This process requires computing multidimensional integrals of the form in Equation (2). In general, obtaining

S_{i}^{t o t}

entails computing

2^{d}

integrals of type in Equation (7).

Earlier discussions highlighted that the fundamental assumption behind representation Equation (6) is that the essential features of model functions (1), which describe typical real-world scenarios, can be captured using low-order subsets of input variables. These subsets include terms up to order

d_{t r}

, where

d_{t r} < d

. Leveraging this assumption allows us to reduce the dimensionality of the original problem. Following Sobol’ [33], we consider any set of m variables (

1 \leq m \leq d - 1

):

y = (x_{k_{1}}, \dots, x_{k_{m}}), 1 \leq k_{1} < \dots < k_{m} \leq d,

and let

z

be the set of

d - m

complementary variables. Thus,

x = (y, z)

. Let

K = (k_{1}, \dots, k_{m})

.

The variances corresponding to the subsets

y

and

z

can be defined as

V_{y} = \sum_{n = 1}^{m} \sum_{(i_{1} < \dots < i_{n}) \in K} V_{i_{1} \dots i_{n}}, V_{z} = \sum_{n = 1}^{d - m} \sum_{(j_{1} < \dots < j_{n}) \in \bar{K}} V_{j_{1} \dots j_{n}},

(9)

where the complement of the subset K in the set of all parameter indices is denoted by

\bar{K}

. The left sum in Equation (9) is extended over all subsets

(i_{1}, \dots, i_{n})

, where all indices

i_{1}, \dots, i_{n}

belong to K. Then, the total variance, corresponding to the subset

y

, is

V_{y}^{t o t} = V - V_{z}

and is extended over all subsets

(i_{1}, \dots, i_{ν}), 1 \leq ν \leq d

, where at least one

i_{l} \in K, 1 \leq l \leq ν

.

The procedure for computing global sensitivity indices is based on the following representation of the variance:

V_{y} : V_{y} = \int f (x) f (y, z^{'}) d x d z^{'} - f_{0}^{2}

(10)

(see [33]). The equality (10) enables the construction of a Monte Carlo algorithm for evaluating

f_{0}, V

and

V_{y}

:

\begin{matrix} \frac{1}{n} \sum_{j = 1}^{n} f (ξ_{j}) \overset{P}{⟶} f_{0}, & \frac{1}{n} \sum_{j = 1}^{n} f (ξ_{j}) f (η_{j}, ζ_{j}^{'}) \overset{P}{⟶} V_{y} + f_{0}^{2}, \\ \frac{1}{n} \sum_{j = 1}^{n} f^{2} (ξ_{j}) \overset{P}{⟶} V + f_{0}^{2}, & \frac{1}{n} \sum_{j = 1}^{n} f (ξ_{j}) f (η_{j}^{'}, ζ_{j}) \overset{P}{⟶} V_{z} + f_{0}^{2}, \end{matrix}

where

ξ = (η, ζ)

is a random sample and

η

corresponds to the input subset denoted by

y

.

For example, for

m = 1, y = {x_{l_{1}}}, l_{1} \in {1, \dots, d}

and

z = {1, \dots, d} ∖ l_{1}

:

S_{l_{1}} = S_{(l_{1})} = V_{(l_{1})} / V, S_{l_{1}}^{t o t} = V_{l_{1}}^{t o t} / V = 1 - S_{z} .

Calculating the impact of higher-order interactions can be achieved through an iterative process. For instance, consider the following equation:

S_{(l_{1} l_{2})} = V_{(l_{1} l_{2})} / V = S_{l_{1}} + S_{l_{2}} + S_{l_{1} l_{2}} .

Here,

S_{l_{1} l_{2}}

can be determined if the corresponding first-order sensitivity indices have already been computed.

2. Monte Carlo Algorithms Based on Modified Sobol’ Sequences

An algorithm following a shaking technique was proposed recently in [40]. This concept involves taking a Sobol’

Λ Π_{τ}

point

x

of dimension d. The point is treated as the center of a sphere with a radius

κ < 1

. A random point

ξ \in U^{d}

is selected uniformly on the sphere. A random variable

θ

is defined as a value of the corresponding integrand at that random point, i.e.,

θ = f (ξ)

.

Next, the random points

ξ^{(i)} (κ) = x^{(i)} + κ ω^{(i)} \in U^{d}, i = 1, \dots, n

are chosen, where

ω^{(i)}

is a unique uniformly distributed vector in

U^{d}

. The radius

κ

is relatively small

κ ≪ \frac{1}{2^{d_{j}}}

, so

ξ^{(i)} (κ)

remains within the same elementary

i^{t h}

interval

E_{i}^{d} = \prod_{j = 1}^{d} [\frac{a_{j}^{(i)}}{2^{d_{j}}}, \frac{a_{j}^{(i)} + 1}{2^{d_{j}}}]

, where the pattern

Λ Π_{τ}

point

x^{(i)}

resides. Using a subscript i in

E_{i}^{d}

indicates that the i-th

Λ Π_{τ}

point

x^{(i)}

is within it. Thus, if

x^{(i)} \in E_{i}^{d}

, then

ξ^{(i)} (κ) \in E_{i}^{d}

too.

It was proven in [40] that the mathematical expectation of the random variable

θ = f (ξ)

coincides with the value of the integral (3); that is,

E θ = S (f) = \int_{U^{d}} f (x) d x .

It allows for defining a randomized algorithm. One can take the Sobol’

Λ Π_{τ}

point

x^{(i)}

and shake it somewhat. Shaking means to define random points

ξ^{(i)} (κ) = x^{(i)} + κ ω^{(i)}

according to the procedure described above. For simplicity, this algorithm is abbreviated as MCA-MSS.

The probability error of the MCA-MSS was examined in [2]. It was demonstrated that, for integrands with continuous and bounded first derivatives, specifically

f \in W^{1} (L; U^{d})

, where

L = ∥ f ∥

, the following relationship applies:

e r r (f, d) \leq c_{d}^{'} ∥f∥ n^{- \frac{1}{2} - \frac{1}{d}} and r (f, d) \leq c_{d}^{″} ∥f∥ n^{- \frac{1}{2} - \frac{1}{d}},

where the constants

c_{d}^{'}

and

c_{d}^{″}

do not depend on n.

Here, a modification of the MCA-MSS is proposed and analyzed. The new algorithm will be called MCA-MSS-S.

It is assumed that

n = m^{d}

,

m \geq 1

. The unit cube

U^{d}

is divided into

m^{d}

disjoint sub-domains, such that they coincide with the elementary d-dimensional subintervals defined in Section 1.2

U^{d} = ⋃_{j = 1}^{m^{d}} K_{j}, where K_{j} = \prod_{i = 1}^{d} [a_{i}^{(j)}, b_{i}^{(j)}),

with

b_{i}^{(j)} - a_{i}^{(j)} = \frac{1}{m}

for all

i = 1, \dots, d

.

Thus, in each d-dimensional sub-domain

K_{j}

, there is precisely one

Λ Π_{τ}

point

x^{(j)}

. If the random point remains within

K_{j}

, i.e.,

ξ^{(j)} (κ) = x^{(j)} + κ ω^{(j)} \in K_{j}

after shaking, one may attempt to use the smoothness of the integrand f if

f \in W^{2} (L; U^{d})

.

Then, if

p (x)

is a probability density function, such that

\int_{U^{d}} p (x) d x = 1

, then

\int_{K_{j}} p (x) d x = p_{j} \leq \frac{c_{1}^{(j)}}{n},

where

c_{1}^{(j)}

are constants. If

d_{j}

is the diameter of

K_{j}

, then

d_{j} = sup_{x_{1}, x_{2} \in K_{j}} | x_{1} - x_{2} | \leq \frac{c_{2}^{(j)}}{n^{1 / d}},

where

c_{2}^{(j)}

are other constants.

In the particular case when the subintervals are with edge

1 / m

for all constants, we have

c_{1}^{(j)} = 1

and

c_{2}^{(j)} = \sqrt{d}

. In each sub-domain

K_{j}

, the central point is denoted by

s^{(j)}

, where

s^{(j)} = (s_{1}^{(j)}, s_{2}^{(j)}, \dots, s_{d}^{(j)})

.

Let us say we choose two points,

ξ^{(j)}

and

ξ^{{(j)}^{'}}

. The point

ξ^{(j)}

is selected during our MCA-MSS procedure, while

ξ^{{(j)}^{'}}

is chosen to be symmetrical to

ξ^{(j)}

with respect to the central point

s^{(j)}

in each cube

K_{j}

. This approach results in a total of

2 m^{d}

random points. We can compute all function values

f (ξ^{(j)})

and

f (ξ^{{(j)}^{'}})

for

j = 1, \dots, m^{d}

and approximate the value of the integral as follows:

I (f) \approx \frac{1}{2 m^{d}} \sum_{j = 1}^{2 n} [f (ξ^{(j)}) + f (ξ^{{(j)}^{'}})] .

(11)

This estimate corresponds to the MCA-MSS-S. We will demonstrate later that this algorithm exhibits an optimal rate of convergence for functions with bounded second derivatives, i.e., for functions

f \in W^{2} (L; U^{d})

, whereas the MCA-MSS has an optimal rate of convergence for functions with bounded first derivatives:

f \in W^{1} (L; U^{d})

.

One can prove the following:

Theorem 1.

The quadrature Formula (11) constructed above for integrands f from

W^{2} (L; U^{d})

satisfies

e r r (f, d) \leq {\tilde{c}}_{d}^{'} ∥f∥ n^{- \frac{1}{2} - \frac{2}{d}}

and

r (f, d) \leq {\tilde{c}}_{d}^{″} ∥f∥ n^{- \frac{1}{2} - \frac{2}{d}},

where the constants

{\tilde{c}}_{d}^{'}

and

{\tilde{c}}_{d}^{″}

do not depend on n.

Proof of Theorem 1.

One can see that

E \{\frac{1}{2 m^{d}} \sum_{j = 1}^{2 n} [f (ξ^{(j)}) + f (ξ^{{(j)}^{'}})]\} = \int_{U^{d}} f (x) d x .

For the fixed

Λ Π_{τ}

point

x^{(j)} \in K_{j}

, one can use the d-dimensional Taylor formula to present the function

f (x^{(j)})

in

K_{j}

around the central point

s^{(j)}

. Since

f \in W^{2} (L; U^{d})

, there exists a d-dimensional point

η^{(j)} \in K_{j}

lying between

x^{(j)}

and

s^{(j)}

such that

\begin{matrix} \begin{matrix} f (x^{(j)}) = f (s^{(j)}) & + & \nabla f (s^{(j)}) (x^{(j)} - s^{(j)}) \\ + & \frac{1}{2} {(x^{(j)} - s^{(j)})}^{T} [V^{2} f (η^{(j)})] (x^{(j)} - s^{(j)}), \end{matrix} \end{matrix}

(12)

where

\nabla f (x) = [\frac{\partial f (x)}{\partial x_{1}}, \dots, \frac{\partial f (x)}{\partial x_{d}}]

and

[V^{2} f (x)] = {[\frac{\partial^{2} f (x)}{\partial x_{i} \partial x_{k}}]}_{i, k = 1}^{d} .

To simplify matters, the superscript

(j)

of the argument in the last two formulas is omitted, assuming that the formulas apply to the

j^{t h}

cube

K_{j}

. Now, we can rewrite Formula (12) at the previously defined random points

ξ

and

ξ^{'}

, both located within

K_{j}

. This yields

f (ξ) = f (s) + \nabla f (s) (ξ - s) + \frac{1}{2!} {(ξ - s)}^{T} [V^{2} f (η)] (ξ - s),

(13)

f (ξ^{'}) = f (s) + \nabla f (s) (ξ^{'} - s) + \frac{1}{2!} {(ξ^{'} - s)}^{T} [V^{2} f (η^{'})] (ξ^{'} - s),

(14)

where

η^{'}

is another d-dimensional point lying between

ξ^{'}

and

s

. Adding Equations (13) and (14), we obtain

\begin{matrix} f (ξ) + f (ξ^{'}) = 2 f (s) & + & \frac{1}{2} \{{(ξ - s)}^{T} [V^{2} f (η)] (ξ - s) + \\ + & {(ξ^{'} - s)}^{T} [V^{2} f (η^{'})] (ξ^{'} - s)\} . \end{matrix}

Due to the symmetry, there is no term involving the gradient

V f (s)

in the previous formula. If we examine the variance

V [f (ξ) + f (ξ^{'})]

, keeping in mind that the variance of the constant

2 f (s)

is zero, we arrive at

\begin{matrix} V [f (ξ) & + & f (ξ^{'})] = \\ = & V \{\frac{1}{2} [{(ξ - s)}^{T} [V^{2} f (η)] (ξ - s) + {(ξ^{'} - s)}^{T} [V^{2} f (η^{'})] (ξ^{'} - s)]\} \\ \leq & E {\{\frac{1}{2} [{(ξ - s)}^{T} [V^{2} f (η)] (ξ - s) + {(ξ^{'} - s)}^{T} [V^{2} f (η^{'})] (ξ^{'} - s)]\}}^{2} . \end{matrix}

Given that

f \in W^{2} (L; U^{d})

, it is possible to enhance the last inequality by substituting the terms

[V^{2} f (η)]

and

[V^{2} f (η^{'})]

with the seminorm L (while removing the front bracket) and replacing the products

{(ξ - s)}^{T} (ξ - s)

and

{(ξ^{'} - s)}^{T} (ξ^{'} - s)

with the squared diameter of the sub-domain

K_{j}

.

Now, let us return to the notation with superscripts, bearing in mind that the preceding considerations apply to an arbitrary sub-domain

K_{j}

. The variance can be estimated as follows:

\begin{matrix} V [f (ξ) + f (ξ^{'})] & \leq & L^{2} sup_{x_{1}^{(j)}, x_{2}^{(j)}} {|x_{1}^{(j)} - x_{2}^{(j)}|}^{4} \leq L^{2} {(c_{2}^{(j)})}^{4} n^{- 4 / d} . \end{matrix}

Therefore, the variance of

θ_{n} = \sum_{j = 1}^{n} θ^{(j)}

can be estimated as follows:

\begin{matrix} \begin{matrix} V θ_{n} = \sum_{j = 1}^{n} p_{j}^{2} V [f (ξ) + f (ξ^{'})] & \leq & \sum_{j = 1}^{n} {(c_{1}^{(j)})}^{2} n^{- 2} L^{2} {(c_{2}^{(j)})}^{4} n^{- 4 / d} \\ \leq & {(L c_{1}^{(j)} c_{2}^{(j) 2})}^{2} n^{- 1 - 4 / d} . \end{matrix} \end{matrix}

(15)

Thus,

r (f, d) \leq {\tilde{c}}_{d}^{″} ∥f∥ n^{- \frac{1}{2} - \frac{2}{d}} .

Following Tchebychev’s inequality to the variance (15) leads to the following estimation

ε (f, d) \leq {\tilde{c}}_{d}^{'} ∥f∥ n^{- \frac{1}{2} - \frac{2}{d}}

for the probable error

ε

, where

{\tilde{c}}_{d}^{'} = \sqrt{2 d}

, which concludes the proof. □

It is noteworthy that the Monte Carlo algorithm MCA-MSS-S possesses an optimal rate of convergence for functions with continuous and bounded second derivatives [9]. This signifies that the rate of convergence (

n^{- \frac{1}{2} - \frac{2}{d}}

) cannot be enhanced for the functional class

W^{2}

within the class of randomized algorithms

A^{R}

.

Note that both the MCA-MSS and MCA-MSS-S have a single control parameter, namely the radius

κ

of the sphere of shaking. However, effectively utilizing this control parameter increases the computational complexity. The challenge arises because, after shaking, the random point might leave the multidimensional sub-domain. Consequently, after each such operation, one must verify whether the random point remains within the same sub-domain. Verifying if a random point lies within a given domain is computationally expensive when dealing with a large number of points. A minor modification to the MCA-MSS-S can help to overcome this difficulty.

Instead of shaking, one can simply generate a random point

ξ^{(j)} \in K_{j}

uniformly distributed inside

K_{j}

, and then take the symmetric point

ξ^{{(j)}^{'}}

according to the central point

s^{(j)}

. This fully randomized approach simulates the MCA-MSS-S, but the shaking occurs with different radii

κ

in each sub-domain. We call this algorithm SS-MCA as it resembles the stratified symmetrized Monte Carlo method [11]. Clearly, the SS-MCA is less computationally expensive than the MCA-MSS-S, but it lacks a control parameter like the radius

κ

, which can be considered as a randomly chosen parameter in each sub-domain

K_{j}

.

It is important to note that all three algorithms, the MCA-MSS, MCA-MSS-S, and SS-MCA, have optimal rates of convergence for their respective functional classes. Specifically, the MCA-MSS is optimal in

W^{1} (L; U^{d})

, while both the MCA-MSS-S and SS-MCA are optimal in

W^{2} (L; U^{d})

.

3. Numerical Experiments

3.1. Case Study: Unified Danish Eulerian Model (UNI-DEM)

Our research centers around the domain of environmental security, with a specific concentration on air pollution transport. In order for current mathematical models of air pollution transport to become trustworthy simulation tools, they must include a comprehensive range of chemical and photochemical reactions [1]. This paper presents the study and numerical outcomes obtained through the use of the large-scale mathematical model named UNI-DEM [1].

This model facilitates the examination of temporal concentration variations of multiple air pollutants and other substances across a vast geographic region (4800 × 4800 km²), encompassing all of Europe, the Mediterranean, and parts of Asia and Africa. Such studies are vital for environmental protection, agriculture, and healthcare. Modeled as a system of partial differential equations, the framework simulates key atmospheric processes, including photochemical interactions between the studied species, emissions, and rapidly shifting meteorological conditions. Nonlinearity and stiffness in the equations largely result from modeling chemical reactions [1]. The chemical mechanism employed in the model is the renowned condensed CBM-IV (Carbon Bond Mechanism). The selection of the UNI-DEM was based on its capacity to accurately account for chemical processes within atmospheric chemistry models.

Given the complexity and scale of this task, direct numerical treatment is impractical. To facilitate numerical solutions, the problem is partitioned into submodels representing the principal physical and chemical processes. The production version of the model employs sequential splitting [41], although alternative splitting techniques were explored and incorporated in certain experimental versions [1]. Spatial and temporal discretization render each submodel computationally intensive, requiring the processing power of today’s most advanced supercomputers. Consequently, parallelization has remained a crucial component of the computer implementation of the UNI-DEM since its development.

Our primary objective is to investigate the sensitivity of the concentrations of key pollutants, such as ozone, in relation to variations in the rates of certain chemical reactions and emission levels. In our approach to sensitivity analysis, we treat the chemical reaction rates and emission levels as input parameters, while the pollutant concentrations serve as output parameters.

3.2. Sensitivity Studies with Respect to Chemical Reaction Rates

Firstly, the results regarding the sensitivity of ozone concentration in Genova to variations in the rates of certain chemical reactions are presented. These reactions include #1, 3, 7, and 22 (time-dependent) as well as #27 and #28 (time-independent) from the condensed CBM-IV scheme ( [1]). The simplified chemical equations for these reactions are as follows:

\begin{matrix} [# 1] N O_{2} + h ν ⟹ N O + O; & [# 22] H O_{2} + N O ⟹ O H + N O_{2}; \\ [# 3] O_{3} + N O ⟹ N O_{2}; & [# 27] H O_{2} + H O_{2} ⟹ H_{2} O_{2}; \\ [# 7] N O_{2} + O_{3} ⟹ N O_{3}; & [# 28] O H + C O ⟹ H O_{2} . \end{matrix}

The domain under consideration is the 6-dimensional hypercubic domain

{[0.6, 1.4]}^{6}

). Polynomials of second degree were used again as an approximation tool [2]. It means that the integrands in our case studies are smooth functions and, in general, do not lead to computational difficulties, but some of the quantities that should be estimated are small in value and a loss of accuracy occurs. That is why a more accurate analysis and numerical approaches should be performed during the numerical experiments.

Following the definitions of Sobol’ global senitivity measures, it is clear that the precise computing of the total variance is crucial. In this context, there is a discussion [38] about the better estimation of

f_{0}^{2} = {(\int_{U^{d}} f (x) d x)}^{2}

inside the expression of the total variance. The first formula is

f_{0}^{2} \approx \frac{1}{n} \sum_{i = 1}^{n} f (x_{i, 1}, \dots, x_{i, d}) f (x_{i, 1}^{'}, \dots, x_{i, d}^{'})

and the second one is

f_{0}^{2} \approx {\{\frac{1}{n} \sum_{i = 1}^{n} f (x_{i, 1}, \dots, x_{i, d})\}}^{2},

where

x

and

x^{'}

are two independent sample vectors. During the current numerical tests, a strong influence of the chosen formula on the accuracy of the SS-MCA was observed. This held true for both case studies with dimensions (specifically, 8 and 12 in the double-dimension case). The second formula yielded reliable results for numerically estimating total variance, aligning with the recommendation in [38], whereas the first formula produced results significantly deviating from the corresponding reference values. The first formula performs better in the case of estimation of sensitivity indices of a fixed order according to the recommendation in [38]). While the other algorithms (Sobol’ QMCA, MCA-MSS, and MCA-MSS-S) were not significantly influenced by the choice in formulas, the same formulas used for the SS-MCA were applied to estimate the corresponding quantities. This approach was taken to ensure a consistent comparison of relative errors across all the algorithms under consideration.

The results for the relative errors in the approximate evaluation of the quantities

f_{0}

, total variance, first-order and total sensitivity indices, and second-order sensitivity indices, using various stochastic approaches for numerical integration, are presented in Table 1, Table 2, Table 3, and Table 4, respectively. In the tables, only those sensitivity indices larger than 1 × 10⁻⁴ were estimated by the algorithms under consideration. The other case corresponds to negligibly small or unimportant main or interaction effects of inputs. One of the best available random number generators, the SIMD-oriented Fast Mersenne Twister (SFMT) [42,43], a 128-bit pseudo-random number generator with a period of

2^{19937} - 1

, was used to generate the required random points. The calculations with pseudo-random numbers generated by the SFMT were averaged based on 10 algorithm runs. Firstly, it should be specified that the quantity

f_{0}

is represented by a 6-dimensional integral, while the remaining quantities under consideration are represented by double-dimensional integrals, following the principles of correlated sampling. It is one of the approaches suggested in [34] to overcome loss of accuracy when

V_{l_{1} \dots l_{ν}} < < f_{0}^{2}

. It is valid for our both case studies. The partial and total variance estimations, based on the correlated sampling approach, are presented in the following manner:

\begin{matrix} V_{y} = \int f (x) [f (y, z^{'}) d x d z^{'} - f (x^{'})] d x d x^{'}, \\ V = \int f (x) [f (x) - f (x^{'})] d x d x^{'} . \end{matrix}

The main conclusions about the properties of the algorithms are as follows: (a) the relative error decreases as the number of points increases; (b) all algorithms are efficient and converge at the expected rate of convergence; (c) the SS-MCA is the most efficient in terms of computational complexity for computing

f_{0}

and the total variance, but its efficiency decreases essentially in the case of computing Sobol’ global sensitivity indices due to loss of accuracy; (d) the SS-MCA overcomes the disadvantages of the MCA-MSS and MCA-MSS-S in terms of computational cost and ease of implementation, but it is not sufficiently efficient for computational problems where accuracy is compromised; (e) these algorithms result in very high relative errors (see Table 4) in cases with small sensitivity measures, indicating that their efficiency depends not only on the integrand dimension but also significantly on the magnitude of the estimated quantity; (d) the MCA-MSS-S has the longest computational time, but it is the most robust, stable, and reliable.

Analyzing the results for the reactions under consideration, one can conclude that the rates of reactions

# # 1, 3

, and 22 have a very significant impact on ozone concentrations. The influence of the rates of reactions

# # 7

and 27 is smaller but still notable. Conversely, the impact of the rate of reaction

# 28

can be considered negligible. Note that ozone does not necessarily participate in all of these reactions; rather, important ozone precursors are involved. A more detailed analysis can be found in [2].

3.3. Sensitivity Studies with Respect to Emission Levels

The second objective of our particular research is to investigate the sensitivity of the model output (in terms of mean monthly concentrations of several important pollutants) with respect to variations in the input emissions of anthropogenic pollutants. The anthropogenic emission input consists of four different components, denoted as

E = (E^{A}, E^{N}, E^{S}, E^{C})

as follows:

\begin{matrix} E^{A} — ammonia (N H_{3}); & E^{S} — sulphur dioxide (S O_{2}); \\ E^{N} — nitrogen oxides (N O + N O_{2}); & E^{C} — anthropogenic hydrocarbons . \end{matrix}

The output of the model is mean monthly concentration of the following 3 pollutants:

s₁—ozone ( $O_{3}$ );
s₂—ammonia ( $N H_{3}$ );
s₃—ammonium sulphate and ammonium nitrate ( $N H_{4} S O_{4} + N H_{4} N O_{3}$ ).

In our particular case, we are interested in sensitivity studies of the mean monthly concentrations of ammonia in Milan. The domain under analysis is the 4-dimensional hypercubic domain

{[0.5, 1]}^{4}

. Second-degree polynomials were employed as an approximation tool [44]. The input data were generated using a specialized version of the UNI-DEM, named SA-DEM, which was specifically developed for sensitivity analysis studies.

The results for relative errors and computational times in the approximate evaluation of the quantities

f_{0}

, total variance, and first-order and total sensitivity indices, using various stochastic approaches for numerical integration, are presented in Table 5, Table 6, and Table 7, respectively. First, it should be noted that the quantity

f_{0}

is represented by a 4-dimensional integral, while the other quantities under consideration are expressed as double-dimensional integrals, following the principles of the correlated sampling technique to compute sensitivity measures reliably [38,45]. The SS-MCA proves to be the most efficient in terms of relative error, with a difference of about 1–3 orders of magnitude (see Table 1, Table 2, Table 5 and Table 6). A key conclusion drawn from the numerical results in Table 5, Table 6 and Table 7 is that all stochastic approaches provide reliable relative errors for sufficiently large sample sizes. The results also highlight that the computational efficiency of the algorithms depends on both the integrand’s dimension and the magnitude of the estimated quantity. The order of relative error varies for different quantities of interest (as seen in the Reference value column), even for the same sample size. Notably, the MCA-MSS-S tends to yield smaller relative errors, although, in most cases, the Sobol’ QMCA and MCA-MSS deliver similar results.

The primary objective here is to examine the computational efficiency of the stochastic algorithms under consideration for evaluating sensitivity measures, such as multidimensional integrals (total variance) or ratios of multidimensional integrals (Sobol’ global sensitivity indices). Nonetheless, some conclusions can be drawn regarding the sensitivity of mean monthly ammonia concentrations in Milan to the input emissions of anthropogenic pollutants based on the preliminary reference or estimated values (see Table 7) of the quantities of interest. The most influential emissions affecting ammonia concentrations are ammonia emissions themselves, which account for approximately

89 %

in the southern part of Milan. Sulfur dioxide emissions also impact ammonia concentrations, although to a lesser degree—around

11 %

. A more detailed analysis is available in [44].

3.4. Numerical Study for Non-Smooth Integrands

In our sensitivity case study, the integrand is a smooth function as a result of the corresponding approximation procedure. However, we would like to complete the study of the properties of the algorithms under consideration in the case of non-smooth integrands. Here, the following non-smooth integrand has been taken:

g (x_{1}, x_{2}, x_{3}, x_{4}) = \sum_{i = 1}^{4} | {(x_{i} - 0.8)}^{- 1 / 3} |,

(16)

for which even the first derivative is undefined. Such applications also arise in several key problems in financial mathematics. The reference value of the integral

S (g) = \int_{U^{d}} g (x) d x

is approximately equal to

7.22261

. The results comparing the computational complexity of four algorithms (plain MCA, Sobol’ QMCA, Owen’s scrambling, and MCA-MSS) are presented in Table 8. The plain Monte Carlo algorithm is implemented by the SFMT pseudo-random number generator.

The main conclusions here are as follows: (a) the plain Monte Carlo algorithm implemented by the SFMT pseudo-random generator is more effective than the Sobol’ algorithm for relatively small sample sizes (n); (b) the MCA-MSS provides results similar to the scrambled QMC, and, for several values of n, we observe advantages of the MCA-MSS in terms of accuracy; (c) both the MCA-MSS and scrambled QMC outperform the plain MCA and Sobol’ QMC algorithm.

4. Discussion of Applicability and Concluding Remarks

In Section 3, the relative errors and the accuracy of the algorithms under consideration are discussed. Computational times for these algorithms corresponding to the whole computational process of

f_{0}

, total variance, first- and second-order, and total sensitivity indices for

N = 4096

in both case studies are presented to complete the characteristics of their computational complexity: Sobol’ QMCA—0.07 s, MCA-MSS—0.3 s, MCA-MSS-S—0.4 s, S-MCA—0.14 s (for sensitivity study according to chemical reaction rates), Sobol’ QMCA—0.004 s, MCA-MSS—0.14 s, MCA-MSS-S—0.22 s, and SS-MCA—0.05 s (for sensitivity study according to emission levels). These data are just to compare the heaviness of the algorithms. The MCA-MSS and MCA-MSS-S have a control parameter

κ

that makes these algorithms flexible and adaptive to a particular numerical problem, but it takes longer than the algorithm using the original Sobol’ points. Here, the particular value of this control parameter is established experimentally. The extent of the shaking and its efficiency depend on the problem dimension, smoothness of the integrands, magnitude and proportions of the estimated quantities, the integration domain, sample size, and minimum distance between Sobol’ deterministic points. It is clear that a larger shaking is reasonable for a smaller number of points. In [2], a comprehensive study of the properties of the algorithms was conducted in both cases of integrands—smooth and non-smooth. A comparison with some known scrambled QMC algorithms was presented. One of the main conclusions is that the MCA-MSS and scrambled QMC outperformed the SFMT and the Sobol’ quasi-MC algorithm when dealing with non-smooth functions. They also exhibited greater stability with respect to relative errors for relatively small numbers of points. Furthermore, in the case of a non-smooth function, the MCA-MSS shows advantages over the Sobol’ quasi-MC algorithm in terms of accuracy for several point counts.

5. Conclusions

The main results of the current work can be sorted into three categories. The first aspect is the unimprovable rate of convergence of the approaches under consideration for functions from a specified class. The second aspect is the complex computer implementations of these algorithms, including one of the best available pseudo-random or quasi-random number generators, variance-based approaches for global sensitivity analysis, various Monte Carlo techniques for multidimensional numerical integration (including the proposed MC algorithms), and the advanced known formulas to estimate small values for sensitivity indices in an adequate way. The third aspect is the very important real-life application regarding the use as a case study of the developed software tool.

Here, a comprehensive analysis and review of three efficient Monte Carlo algorithms for multidimensional numerical integration based on modified Sobol’ sequences were conducted. The effectiveness of the algorithms was shown in terms of the unimprovable rates of convergence for both probability and mean square error in the corresponding functional classes. Several numerical tests were carried out that confirm the theoretical rate of convergence. The results for relative errors were compared, as well as for the Sobol’ approach as a base. A detailed analysis regarding the advantages and restrictions of the algorithms under consideration was presented. The algorithm’s effectiveness was demonstrated for a real-world case study with great importance—sensitivity analysis for a large-scale mathematical model describing remote air pollution transport in Europe with respect to chemical reaction rates and emission levels. Although the MCA-MSS-S has the longest computational time, it is the most robust, stable, and reliable. The SS-MCA overcomes the disadvantages of the MCA-MSS and MCA-MSS-S in terms of computational cost and offers easy performance, but it is not efficient enough for computational problems where a loss of accuracy appears. Moreover, the SS-MCA is the most efficient in terms of computational complexity for computing

f_{0}

and the total variance. Since the estimation of sensitivity indices is a serious computational challenge due to loss of accuracy, it may be crucial to achieve a more accurate distribution of the influence of inputs, and it is very important that the numerical results obtained emphasize the benefits of the MCA-MSS in terms of accuracy when dealing with relatively small sensitivity indices.

Author Contributions

Conceptualization, I.D. and R.G.; methodology, I.D.; software, R.G.; validation, I.D. and R.G.; formal analysis, I.D. and R.G.; investigation, I.D. and R.G.; resources, I.D.; data curation, I.D. and R.G.; writing—original draft preparation, I.D. and R.G.; writing—review and editing, I.D. and R.G.; visualization, R.G.; supervision, I.D.; project administration, I.D.; funding acquisition, I.D. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Bulgarian National Science Fund, grant number KP-06-N52/5 “Efficient methods for modeling, optimization and decision making”.

Data Availability Statement

All research data used and provided in the paper are public. The needed sources are provided in the References. The statements, opinions and data contained in all publications are solely those of the authors and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Acknowledgments

Authors acknowledge the technical support for preliminary computations with UNI-DEM to Zahari Zlatev and Tzvetan Ostromsky.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Abbreviations

The following abbreviations are used in this manuscript:

MCA-MSS	Monte Carlo Algorithm based on Modified Sobol’ Sequences
MCA-MSS-S	Monte Carlo Algorithm based on Modified Sobol’ Sequences and Symmetrized
SS-MCA	Stratified Symmetrized Monte Carlo Algorithm
SFMT	SIMD (Single Instruction/Multiple Data)-oriented Fast Mersenne Twister
GSA	Global Sensitivity Analysis
SA	Sensitivity Analysis
ANOVA	Analysis of Variance
UNI-DEM	Unified Danish Eulerian Model
PDEs	Partial Differential Equations
PDF	Probability Density Function
TSI	Total Sensitivity Index
MCA	Monte Carlo Algorithm
RQMC	Randomized Quasi-Monte Carlo
FAST	Fourier Amplitude Sensitivity Test

References

Zlatev, Z.; Dimov, I.T. Computational and Numerical Challenges in Environmental Modelling; Elsevier: Amsterdam, The Netherlands, 2006. [Google Scholar]
Dimov, I.T.; Georgieva, R.; Ostromsky, T.Z.; Zlatev, Z. Variance-Based Sensitivity Analysis of the Unified Danish Eulerian Model According to Variations of Chemical Rates; Dimov, I., Farago, I., Vulkov, L., Eds.; NAA 2012, LNCS 8236; Springer: Berlin/Heidelberg, Germany, 2013; pp. 247–254. [Google Scholar]
Morris, M. Factorial sampling plans for preliminary computational experiments. Technometrics 1991, 33, 161–174. [Google Scholar] [CrossRef]
Piano, S.L.; Ferretti, F.; Puy, A.; Albrecht, D.; Daniel; Saltelli, A. Variance-based Sensitivity Analysis: The Quest for Better Estimators and Designs Between Explorativity and Economy. Reliab. Eng. Syst. Saf. 2021, 206, 107300. [Google Scholar] [CrossRef]
Joe, S.; Kuo, F. Constructing Sobol’ sequences with better two-dimensional projections. SIAM J. Sci. Comput. 2008, 30, 2635–2654. [Google Scholar] [CrossRef]
Owen, A. Randomly permuted (t, m, s)-nets and (t, s)-sequences, Monte Carlo and Quasi-Monte Carlo Methods in Scientific Computing. Lect. Notes Stat. 1995, 106, 299–317. [Google Scholar]
The NAG C Library. Available online: https://support.nag.com/content/installers-and-users-notes-nag-products (accessed on 17 April 2025).
Tarantola, S.; Becker, W. SIMLAB Software for Uncertainty and Sensitivity Analysis. In Handbook of Uncertainty Quantification; Ghanem, R., Higdon, D., Owhadi, H., Eds.; Springer: Cham, Switzerland, 2015. [Google Scholar] [CrossRef]
Dimov, I.T. Monte Carlo Methods for Applied Scientists; World Scientific: London, UK; Singapore, 2008. [Google Scholar]
Weyl, H. Ueber die Gleichverteilung von Zahlen mod Eins. Math. Ann. 1916, 77, 313–352. [Google Scholar] [CrossRef]
Sobol, I.M. A Primer for the Monte Carlo Method; CRC-Press: Boca Raton, FL, USA, 1994; ISBN 10:084938673X/13:9780849386732. [Google Scholar]
Sobol, I.M. On quadratic formulas for functions of several variables satisfying a general Lipschitz condition. USSR Comput. Math. Math. Phys. 1989, 29, 936–941. [Google Scholar] [CrossRef]
Niederreiter, H. Low-discrepancy and low-dispersion sequences. J. Number Theory 1988, 30, 51–70. [Google Scholar] [CrossRef]
Sobol, I.; Asotsky, D.; Kreinin, A.; Kucherenko, S. Construction and comparison of high-dimensional Sobol’ generators. Wilmott J. 2011, 67–79. [Google Scholar] [CrossRef]
Bradley, P.; Fox, B. Algorithm 659: Implementing Sobol’s quasi random sequence generator. ACM Trans. Math. Softw. 1988, 14, 88–100. [Google Scholar] [CrossRef]
Sobol, I.M. On the systematic search in a hypercube. SIAM J. Numer. Anal. 1979, 16, 790–793. [Google Scholar] [CrossRef]
Levitan, Y.; Markovich, N.; Rozin, S.; Sobol, I. On quasi-random sequences for numerical computations. USSR Comput. Math. Math. Phys. 1988, 28, 755–759. [Google Scholar] [CrossRef]
L’Ecuyer, P.; Lecot, C.; Tuffin, B. A randomized quasi-Monte Carlo simulation method for Markov chains. Oper. Res. 2008, 56, 958–975. [Google Scholar] [CrossRef]
L’Ecuyer, P.; Lemieux, C. Recent advances in randomized quasi-Monte Carlo methods. In Modeling Uncertainty: An Examination of Stochastic Theory, Methods, and Applications; Dror, M., L’Ecuyer, P., Szidarovszki, F., Eds.; Kluwer Academic Publishers: Boston, MA, USA, 2002; pp. 419–474. [Google Scholar]
Caflisch, R.E.; Morokoff, W.; Owen, A.B. Valuation of mortgage backed securities using Brownian bridges to reduce effective dimension. J. Comput. Financ. 1997, 1, 27–46. [Google Scholar] [CrossRef]
Kucherenko, S.; Feil, B.; Shah, N.; Mauntz, W. The identification of model effective dimensions using global sensitivity analysis. Reliab. Eng. Syst. Saf. 2011, 96, 440–449. [Google Scholar] [CrossRef]
Sobol, I.M. Quasi-Monte Carlo methods. In International Youth Workshop on Monte Carlo Methods and Parallel Algorithms, Primorsko, Bulgaria 24–30 September, 1989; Sendov, B.l., Dimov, I.T., Eds.; World Scientific: Singapore, 1991; pp. 75–81. [Google Scholar]
Hong, H.; Hickernell, F. Algorithm 823: Implementing scrambled digital sequences. ACM Trans. Math. Softw. 2003, 29, 95–109. [Google Scholar] [CrossRef]
Owen, A. Variance and Discrepancy with Alternative Scramblings. ACM Trans. Model. Comput. Simul. 2002, V, 1–16. [Google Scholar]
Tezuka, S. Uniform Random Numbers, Theory and Practice; Kluwer Academic Publishers: Kanagawa-ken, Japan, 1995. [Google Scholar]
Matousek, J. On the L₂-discrepancy for anchored boxes. J. Complex. 1998, 14, 527–556. [Google Scholar] [CrossRef]
Chi, H.; Beerli, P.; Evans, D.; Mascagni, M. On the scrambled Sobol’ sequences. In Computational Science—ICCS 2005; Sunderam, V.S., van Albada, G.D., Sloot, P.M.A., Dongarra, J., Eds.; ICCS 2005, LNCS 3516; Springer: Berlin/Heidelberg, Germany, 2005; pp. 775–782. [Google Scholar]
Owen, A. Scrambled Net Variance for Integrals of Smooth Functions. Ann. Statist. 1997, 25, 1541–1562. [Google Scholar] [CrossRef]
Borgonovo, E.; Plischke, E. Sensitivity Analysis: A Review of Recent Advances. Eur. J. Oper. Res. 2016, 248, 869–887. [Google Scholar] [CrossRef]
Ferretti, F.; Saltelli, A.; Tarantola, S. Trends in Sensitivity Analysis Practice in the Last Decade, Science of the Total Environment. Spec. Issue Hum. Biota Expo. 2016, 568, 666–670. [Google Scholar]
Saltelli, A.; Tarantola, S.; Chan, K. A Quantitative Model-Independent Method for Global Sensitivity Analysis of Model Output. Source Technomet. Arch. 1999, 41, 39–56. [Google Scholar] [CrossRef]
Ishigami, T.; Homma, T. An Importance Quantification Technique in Uncertainty Analysis for Computer Models. In Proceedings of the [1990] Proceedings, First International Symposium on Uncertainty Modeling and Analysis, College Park, MD, USA, 3–5 December 1990; pp. 398–403. [Google Scholar]
Sobol, I. Global Sensitivity Indices for Nonlinear Mathematical Models and Their Monte Carlo Estimates. Math. Comput. Simul. 2001, 55, 271–280. [Google Scholar] [CrossRef]
Sobol, I.; Myshetskaya, E. Monte Carlo Estimators for Small Sensitivity Indices. Monte Carlo Methods Appl. 2007, 13, 455–465. [Google Scholar] [CrossRef]
Sobol, I.M.; Kucherenko, S. Derivative based global sensitivity measures and their links with global sensitivity indices. Math. Comput. Simul. 2009, 79, 3009–3017. [Google Scholar] [CrossRef]
Sobol, I. Sensitivity Estimates for Nonlinear Mathematical Models. Math. Model. Comput. Exp. 1993, 1, 407–414. [Google Scholar]
McKay, M. Evaluating Prediction Uncertainty; Technical Report NUREG/CR-6311; US Nuclear Regulatory Commission and Los Alamos National Laboratory. In Proceedings of the 29th Symposium on the Interface: Computing Science and Statistics, Houston, TX, USA, 14–17 May 1997. [Google Scholar]
Homma, T.; Saltelli, A. Importance measures in global sensitivity analysis of nonlinear models. Reliab. Eng. Syst. Saf. 1996, 52, 1–17. [Google Scholar] [CrossRef]
Azzini, I.; Mara, T.; Rosati, R. Comparison of Two Sets of Monte Carlo Estimators of Sobol’ Indices. Environ. Model. Softw. 2021, 144, 105167. [Google Scholar] [CrossRef]
Dimov, I.T.; Georgieva, R. Monte Carlo method for numerical integration based on Sobol’ sequences. In LNCS 6046; Springer: Berlin/Heidelberg, Germany, 2011; pp. 50–59. [Google Scholar]
Marchuk, G.I. Mathematical Modeling for the Problem of the Environment, Studies in Mathematics and Applications, No. 16; North-Holland: Amsterdam, The Netherlands, 1985. [Google Scholar]
Saito, M.; Matsumoto, M. SIMD-oriented Fast Mersenne Twister: A 128-bit Pseudorandom Number Generator. In Monte Carlo and Quasi-Monte Carlo Methods; Springer: Berlin/Heidelberg, Germany, 2008; pp. 607–622. [Google Scholar]
Available online: https://www.math.sci.hiroshima-u.ac.jp/m-mat/MT/SFMT/ (accessed on 1 April 2025).
Dimov, I.T.; Georgieva, R.; Ostromsky, T.Z.; Zlatev, Z. Sensitivity Studies of Pollutant Concentrations Calculated by UNI-DEM with Respect to the Input Emissions. Cent. Eur. J. Math. 2013, 11, 1531–1545. [Google Scholar] [CrossRef]
Sobol, I.; Tarantola, S.; Gatelli, D.; Kucherenko, S.; Mauntz, W. Estimating the approximation error when fixing unessential factors in global sensitivity analysis. Reliab. Eng. Syst. Saf. 2007, 92, 957–960. [Google Scholar] [CrossRef]

Table 1. Relative errors for the approximate evaluation of

f_{0} \approx 0.27

for various numbers of points n.

Table 1. Relative errors for the approximate evaluation of

f_{0} \approx 0.27

for various numbers of points n.

n	Sobol’ QMCA	MCA-MSS		MCA-MSS-S		SS-MCA
n	Rel. Error	$κ$	Rel. Error	$κ$	Rel. Error	Rel. Error
$2^{5}$	4 × $10^{- 3}$	2 × $10^{- 1}$	3 × $10^{- 3}$
$2 \times 2^{5}$				2 × $10^{- 2}$	5 × $10^{- 4}$
$2^{7}$	2 × $10^{- 3}$	4 × $10^{-}$ 02	1 × $10^{- 3}$			3 × $10^{- 3}$
$2 \times 2^{7}$				1 × $10^{-}$ 02	2 × $10^{- 3}$
$2^{9}$	9 × $10^{- 5}$	5 × $10^{- 3}$	2 × $10^{- 5}$
$2 \times 2^{9}$				9 × $10^{- 4}$	3 × $10^{- 4}$	2 × $10^{- 5}$
$2^{11}$	6 × $10^{- 5}$	4 × $10^{- 3}$	2 × $10^{- 5}$
$2 \times 2^{11}$				7 × $10^{- 4}$	1 × $10^{- 4}$
$2^{13}$	2 × $10^{- 6}$	2 × $10^{- 3}$	5 × $10^{- 5}$			3 × $10^{- 6}$
$2 \times 2^{13}$				2 × $10^{- 4}$	5 × $10^{- 6}$

Table 2. Relative errors for the approximate evaluation of the total variance

V \approx 0.0025

for various numbers of points n.

Table 2. Relative errors for the approximate evaluation of the total variance

V \approx 0.0025

for various numbers of points n.

n	Sobol’ QMCA	MCA-MSS		MCA-MSS-S		SS-MCA
n	Rel. Error	$κ$	Rel. Error	$κ$	Rel. Error	Rel. Error
$2^{5}$	2 × $10^{- 2}$	2 × $10^{- 1}$	1 × $10^{- 3}$
$2 \times 2^{5}$				2 × $10^{- 2}$	6 × $10^{- 2}$
$2^{7}$	3 × $10^{- 3}$	4 × $10^{- 2}$	1 × $10^{- 3}$			6 × $10^{- 2}$
$2 \times 2^{7}$				1 × $10^{- 2}$	2 × $10^{- 3}$
$2^{9}$	2 × $10^{- 3}$	5 × $10^{- 3}$	1 × $10^{- 3}$
$2 \times 2^{9}$				9 × $10^{- 4}$	3 × $10^{- 3}$	1 × $10^{- 3}$
$2^{11}$	5 × $10^{- 4}$	4 × $10^{- 3}$	1 × $10^{- 3}$
$2 \times 2^{11}$				7 × $10^{- 4}$	7 × $10^{- 4}$
$2^{13}$	2 × $10^{- 5}$	2 × $10^{- 3}$	8 × $10^{- 5}$			5 × $10^{- 4}$
$2 \times 2^{13}$				1 × $10^{- 4}$	5 × $10^{- 5}$

Table 3. Relative errors for estimation of sensitivity indices of chemical rate constants (

n = 2^{12} = 4^{6} = 4096

).

Table 3. Relative errors for estimation of sensitivity indices of chemical rate constants (

n = 2^{12} = 4^{6} = 4096

).

Estimated Quantity	Reference Value	Sobol’ QMCA	MCA-MSS $κ \approx 2 \times 10^{- 3}$	MCA-MSS-S $κ \approx 4 \times 10^{- 4}$	SS-MCA
Estimated Quantity	Reference Value	Rel. Error	Rel. Error	Rel. Error	Rel. Error
$f_{0}$	3 × $10^{- 1}$	5 × $10^{- 5}$	2 × $10^{- 5}$	7 × $10^{- 3}$	3 × $10^{- 6}$
$V$	3 × $10^{- 3}$	6 × $10^{- 4}$	3 × $10^{- 4}$	4 × $10^{- 4}$	5 × $10^{- 4}$
$S_{1}$	4 × $10^{- 1}$	5 × $10^{- 3}$	3 × $10^{- 3}$	3 × $10^{- 4}$	1 × $10^{- 2}$
$S_{2}$	3 × $10^{- 1}$	4 × $10^{- 3}$	9 × $10^{- 3}$	1 × $10^{- 3}$	4 × $10^{- 2}$
$S_{3}$	5 × $10^{- 2}$	9 × $10^{- 4}$	1 × $10^{- 3}$	6 × $10^{- 3}$	1 × $10^{- 1}$
$S_{4}$	3 × $10^{- 1}$	3 × $10^{- 3}$	5 × $10^{- 3}$	4 × $10^{- 3}$	5 × $10^{- 3}$
$S_{6}$	2 × $10^{- 2}$	3 × $10^{- 2}$	2 × $10^{- 2}$	3 × $10^{- 4}$	3 × $10^{- 2}$
$S_{1}^{t o t}$	4 × $10^{- 1}$	5 × $10^{- 3}$	3 × $10^{- 3}$	1 × $10^{- 3}$	5 × $10^{- 2}$
$S_{2}^{t o t}$	3 × $10^{- 1}$	8 × $10^{- 4}$	5 × $10^{- 3}$	6 × $10^{- 4}$	2 × $10^{- 2}$
$S_{3}^{t o t}$	5 × $10^{- 2}$	4 × $10^{- 3}$	1 × $10^{- 2}$	4 × $10^{- 3}$	1 × $10^{- 1}$
$S_{4}^{t o t}$	3 × $10^{- 1}$	1 × $10^{- 3}$	2 × $10^{- 3}$	4 × $10^{- 3}$	3 × $10^{- 2}$
$S_{6}^{t o t}$	2 × $10^{- 2}$	1 × $10^{- 2}$	4 × $10^{- 3}$	3 × $10^{- 3}$	3 × $10^{- 2}$

Table 4. Relative errors for estimation of sensitivity indices of the second order of chemical rate constants (

n = 2^{14}

= 16,384).

Table 4. Relative errors for estimation of sensitivity indices of the second order of chemical rate constants (

n = 2^{14}

= 16,384).

Estimated Quantity	Reference Value	Sobol’ QMCA	MCA-MSS $κ \approx 2 \times 10^{- 4}$	MCA-MSS-S $κ \approx 2 \times 10^{- 4}$	SS-MCA $N = 2 \times 5^{6} = 2 \times 15, 625$
Estimated Quantity	Reference Value	Rel. Error	Rel. Error	Rel. Error	Rel. Error
$S_{12}$	6 × $10^{- 3}$	1 × $10^{- 3}$	3 × $10^{- 3}$	2 × $10^{- 4}$	5 × $10^{- 1}$
$S_{14}$	5 × $10^{- 3}$	2 × $10^{- 3}$	9 × $10^{- 4}$	2 × $10^{- 3}$	8 × $10^{- 1}$
$S_{24}$	3 × $10^{- 3}$	5 × $10^{- 3}$	4 × $10^{- 3}$	1 × $10^{- 2}$	5 × $10^{- 1}$

Table 5. Relative errors for the approximate evaluation of

f_{0} \approx 0.048

for various numbers of points n.

Table 5. Relative errors for the approximate evaluation of

f_{0} \approx 0.048

for various numbers of points n.

n	Sobol’ QMCA	MCA-MSS		MCA-MSS-S		SS-MCA
n	Rel. Error	$κ$	Rel. Error	$κ$	Rel. Error	Rel. Error
$2^{5}$	2 × $10^{- 2}$	1 × $10^{- 1}$	7 × $10^{- 3}$			3 × $10^{- 5}$
$2 \times 2^{5}$				1 × $10^{- 2}$	1 × $10^{- 2}$
$2^{7}$	3 × $10^{- 3}$	1 × $10^{- 2}$	3 × $10^{- 4}$			1 × $10^{- 4}$
$2 \times 2^{7}$				4 × $10^{- 3}$	4 × $10^{- 3}$
$2^{9}$	1 × $10^{- 3}$	2 × $10^{- 3}$	1 × $10^{- 3}$			1 × $10^{- 5}$
$2 \times 2^{9}$				7 × $10^{- 4}$	1 × $10^{- 3}$
$2^{11}$	3 × $10^{- 4}$	1 × $10^{- 3}$	4 × $10^{- 5}$			4 × $10^{- 6}$
$2 \times 2^{11}$				3 × $10^{- 4}$	1 × $10^{- 4}$
$2^{13}$	5 × $10^{- 5}$	5 × $10^{- 4}$	3 × $10^{- 5}$			3 × $10^{- 6}$
$2 \times 2^{13}$				6 × $10^{- 5}$	9 × $10^{- 5}$

Table 6. Relative errors for the approximate evaluation of the total variance

V \approx 0.0002

for various numbers of points n.

Table 6. Relative errors for the approximate evaluation of the total variance

V \approx 0.0002

for various numbers of points n.

n	Sobol’ QMCA	MCA-MSS		MCA-MSS-S		SS-MCA
n	Rel. Error	$κ$	Rel. Error	$κ$	Rel. Error	Rel. Error
$2^{5}$	2 × $10^{- 2}$	1 × $10^{- 2}$	1 × $10^{- 2}$			5 × $10^{- 3}$
$2 \times 2^{5}$				1 × $10^{- 2}$	1 × $10^{- 2}$
$2^{7}$	3 × $10^{- 2}$	8 × $10^{- 3}$	3 × $10^{- 2}$			7 × $10^{- 3}$
$2 \times 2^{7}$				8 × $10^{- 3}$	7 × $10^{- 3}$
$2^{9}$	5 × $10^{- 3}$	2 × $10^{- 3}$	1 × $10^{- 3}$			9 × $10^{- 4}$
$2 \times 2^{9}$				7 × $10^{- 4}$	2 × $10^{- 2}$
$2^{11}$	7 × $10^{- 4}$	2 × $10^{- 3}$	4 × $10^{- 4}$			3 × $10^{- 4}$
$2 \times 2^{11}$				6 × $10^{- 4}$	1 × $10^{- 4}$
$2^{13}$	6 × $10^{- 4}$	5 × $10^{- 4}$	2 × $10^{- 4}$			2 × $10^{- 4}$
$2 \times 2^{13}$				1 × $10^{- 4}$	6 × $10^{- 5}$

Table 7. Relative errors for estimation of sensitivity indices of emission levels (

n = 2^{12} = 8^{4} = 4096

).

Table 7. Relative errors for estimation of sensitivity indices of emission levels (

n = 2^{12} = 8^{4} = 4096

).

Estimated Quantity	Reference Value	Sobol’ QMCA	MCA-MSS $κ \approx 1 \times 10^{- 4}$	MCA-MSS-S $κ \approx 1 \times 10^{- 4}$	SS-MCA
Estimated Quantity	Reference Value	Rel. Error	Rel. Error	Rel. Error	Rel. Error
$f_{0}$	5 × $10^{- 2}$	1 × $10^{- 4}$	1 × $10^{- 4}$	5 × $10^{- 5}$	3 × $10^{- 6}$
$V$	2 × $10^{- 4}$	3 × $10^{- 4}$	3 × $10^{- 4}$	7 × $10^{- 4}$	2 × $10^{- 4}$
$S_{1}$	9 × $10^{- 1}$	2 × $10^{- 5}$	2 × $10^{- 5}$	2 × $10^{- 4}$	3 × $10^{- 3}$
$S_{2}$	2 × $10^{- 4}$	1 × $10^{- 2}$	1 × $10^{- 2}$	2 × $10^{- 1}$	4 × $10^{- 1}$
$S_{3}$	1 × $10^{- 1}$	9 × $10^{- 4}$	9 × $10^{- 4}$	2 × $10^{- 3}$	2 × $10^{- 2}$
$S_{1}^{t o t}$	9 × $10^{- 1}$	1 × $10^{- 4}$	1 × $10^{- 4}$	3 × $10^{- 4}$	3 × $10^{- 3}$
$S_{2}^{t o t}$	2 × $10^{- 4}$	2 × $10^{- 3}$	9 × $10^{- 4}$	1 × $10^{- 1}$	1 × $10^{0}$
$S_{3}^{t o t}$	1 × $10^{- 1}$	2 × $10^{- 4}$	1 × $10^{- 4}$	1 × $10^{- 3}$	2 × $10^{- 2}$

Table 8. Relative error and computational time for numerical integration of a non-smooth function (

S (g) \approx 7.22261

).

Table 8. Relative error and computational time for numerical integration of a non-smooth function (

S (g) \approx 7.22261

).

n	Plain		Sobol’ QMCA		Owen’s Scrambling		MCA-MSS
n	Rel. Error	Time (s)	Rel. Error	Time (s)	Rel. Error	Time (s)	$κ$	Rel. Error	Time (s)
$10^{3}$	0.0010	0.011	0.0027	0.001	0.0021	0.002	1.9	0.0024	0.020
							6.4	0.0004	0.025
${7.10}^{3}$	0.0009	0.072	0.0013	0.009	0.0003	0.011	1.0	0.0004	0.110
							3.4	0.0005	0.114
${3.10}^{4}$	0.0005	0.304	0.0003	0.032	0.0003	0.041	0.6	0.0001	0.440
							1.9	0.0002	0.480
${5.10}^{4}$	0.0007	0.513	0.0002	0.053	2 × $10^{- 5}$	0.066	0.4	7 × $10^{- 5}$	0.775
							1.4	0.0001	0.788

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Dimov, I.; Georgieva, R. Cutting-Edge Stochastic Approach: Efficient Monte Carlo Algorithms with Applications to Sensitivity Analysis. Algorithms 2025, 18, 252. https://doi.org/10.3390/a18050252

AMA Style

Dimov I, Georgieva R. Cutting-Edge Stochastic Approach: Efficient Monte Carlo Algorithms with Applications to Sensitivity Analysis. Algorithms. 2025; 18(5):252. https://doi.org/10.3390/a18050252

Chicago/Turabian Style

Dimov, Ivan, and Rayna Georgieva. 2025. "Cutting-Edge Stochastic Approach: Efficient Monte Carlo Algorithms with Applications to Sensitivity Analysis" Algorithms 18, no. 5: 252. https://doi.org/10.3390/a18050252

APA Style

Dimov, I., & Georgieva, R. (2025). Cutting-Edge Stochastic Approach: Efficient Monte Carlo Algorithms with Applications to Sensitivity Analysis. Algorithms, 18(5), 252. https://doi.org/10.3390/a18050252

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Cutting-Edge Stochastic Approach: Efficient Monte Carlo Algorithms with Applications to Sensitivity Analysis

Abstract

1. Introduction: Problem Setting

1.1. Computational Complexity of Algorithms

1.2. Sobol’ Sequences

1.3. Randomized Quasi-Monte Carlo (RQMC)

1.4. Concept of Sensitivity Analysis

1.5. Sobol’ Approach

2. Monte Carlo Algorithms Based on Modified Sobol’ Sequences

3. Numerical Experiments

3.1. Case Study: Unified Danish Eulerian Model (UNI-DEM)

3.2. Sensitivity Studies with Respect to Chemical Reaction Rates

3.3. Sensitivity Studies with Respect to Emission Levels

3.4. Numerical Study for Non-Smooth Integrands

4. Discussion of Applicability and Concluding Remarks

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI