Emulation of Leaf, Canopy and Atmosphere Radiative Transfer Models for Fast Global Sensitivity Analysis

Verrelst, Jochem; Sabater, Neus; Rivera, Juan Pablo; Muñoz-Marí, Jordi; Vicent, Jorge; Camps-Valls, Gustau; Moreno, José

doi:10.3390/rs8080673

Open AccessArticle

Emulation of Leaf, Canopy and Atmosphere Radiative Transfer Models for Fast Global Sensitivity Analysis

¹

Image Processing Laboratory (IPL), Parc Científic, Universitat de València, Paterna 46980, Spain

²

Departamento de Oceanografía Física, Centro de Investigación Científica y de Educación Superior de Ensenada, Ensenada 22860, Mexico

^*

Author to whom correspondence should be addressed.

Remote Sens. 2016, 8(8), 673; https://doi.org/10.3390/rs8080673

Submission received: 9 April 2016 / Revised: 1 August 2016 / Accepted: 16 August 2016 / Published: 19 August 2016

Download

Browse Figures

Versions Notes

Abstract

:

Physically-based radiative transfer models (RTMs) help understand the interactions of radiation with vegetation and atmosphere. However, advanced RTMs can be computationally burdensome, which makes them impractical in many real applications, especially when many state conditions and model couplings need to be studied. To overcome this problem, it is proposed to substitute RTMs through surrogate meta-models also named emulators. Emulators approximate the functioning of RTMs through statistical learning regression methods, and can open many new applications because of their computational efficiency and outstanding accuracy. Emulators allow fast global sensitivity analysis (GSA) studies on advanced, computationally expensive RTMs. As a proof-of-concept, three machine learning regression algorithms (MLRAs) were tested to function as emulators for the leaf RTM PROSPECT-4, the canopy RTM PROSAIL, and the computationally expensive atmospheric RTM MODTRAN5. Selected MLRAs were: kernel ridge regression (KRR), neural networks (NN) and Gaussian processes regression (GPR). For each RTM, 500 simulations were generated for training and validation. The majority of MLRAs were excellently validated to function as emulators with relative errors well below 0.2%. The emulators were then put into a GSA scheme and compared against GSA results as generated by original PROSPECT-4 and PROSAIL runs. NN and GPR emulators delivered identical GSA results, while processing speed compared to the original RTMs doubled for PROSPECT-4 and tripled for PROSAIL. Having the emulator-GSA concept successfully tested, for six MODTRAN5 atmospheric transfer functions (outputs), i.e., direct and diffuse at-surface solar irradiance (

E_{d i f}

,

E_{d i r}

), direct and diffuse upward transmittance (

T_{d i r}

,

T_{d i f}

), spherical albedo (S) and path radiance (

L_{0}

), the most accurate MLRA’s were subsequently applied as emulator into the GSA scheme. The sensitivity analysis along the 400–2500 nm spectral range took no more than a few minutes on a contemporary computer—in comparison, the same analysis in the original MODTRAN5 would have taken over a month. Key atmospheric drivers were identified, which are on the one hand aerosol optical properties, i.e., aerosol optical thickness (AOT), Angstrom coefficient (AMS) and scattering asymmetry variable (G), mostly driving diffuse atmospheric components,

E_{d i f}

and

T_{d i f}

; and those affected by atmospheric scattering,

L_{0}

and S. On the other hand, as expected, AOT, AMS and columnar water vapor (CWV) in the absorption regions mostly drive

E_{d i r}

and

T_{d i r}

atmospheric functions. The presented emulation schemes showed very promising results in replacing costly RTMs, and we think they can contribute to the adoption of machine learning techniques in remote sensing and environmental applications.

Keywords:

emulator; global sensitivity analysis; machine learning; radiative transfer models; PROSPECT; SAIL; MODTRAN

Graphical Abstract

1. Introduction

Since the advent of optical remote sensing, physically-based radiative transfer models (RTMs) have deeply helped in understanding the radiation processes occurring on the Earth’s surface and their interactions with vegetation and atmosphere (e.g., [1,2]). RTMs are deterministic models that describe absorption and scattering, and some of them even describe the microwave region, thermal emission or sun-induced chlorophyll fluorescence emitted by vegetation (e.g., [3,4]). They are useful in a wide range of applications including (i) sensitivity analysis; (ii) developing inversion models to accurately retrieve atmospheric and vegetation properties from remotely sensed data (see [5] for a review); and (iii) to generate artificial scenes as would be observed by an optical sensor (e.g., [6,7]). Plant and atmospheric RTMs are currently used in an end-to-end simulator that functions as a virtual laboratory in the development of next-generation optical missions [8,9].

When it comes to vegetation analysis, RTMs have found a wide range of applications to model, study and understand light interception by plant canopies and the interpretation of vegetation reflectance in terms of biophysical characteristics [2,10,11]. A diversity of canopy RTMs have been developed over the last three decades with varying degrees of complexity. Gradual improvement in RTMs accuracy, yet in complexity too, have diversified RTMs from simple turbid medium RTMs towards advanced Monte Carlo RTMs that allow for explicit 3D representations of complex canopy architectures (e.g., see the RAMI exercises [12,13] for a meticulous comparison). This evolution has resulted in an increase in the computational requirements to run the model, which bears implications towards practical applications. In general, RTMs can be categorized as either (1) ’economically invertible’ (or computationally cheap); or as (2) ’non-economically invertible’ models (or computationally expensive). These terms refer to the model complexity and associated processing speed constraining the inversion of such models: Economically invertible models are models with relatively few input parameters and fast processing that enables fast calculations and so applications such as model inversion or rendering of simulated scenes. Examples of this category include the widely used leaf RTM PROSPECT [14], canopy RTM SAIL [15] and atmospheric RTM 6S [16] and OSS [17].

Non-economically invertible RTMs refer to advanced RTMs, often with a large number of input variables and sophisticated computational and mathematical modelling. These type of RTMs enable the generation of complex or detailed scenes, but at the expense of a tedious computational load. In short, the following families of RTMs can be considered as non-economically invertible: (1) Monte Carlo ray tracing models (e.g., Raytran [18], FLIGHT [19] and Drat [20]); (2) voxel-based models (e.g., DART [21]) and (3) advanced integrated vegetation and atmospheric transfer models (e.g., SimSphere [22], SCOPE [23] and MODTRAN [24]). Although these advanced models serve perfectly as virtual laboratories for fundamental research on light-vegetation and atmosphere interactions (see e.g., [18,25]), their high computational cost make them impractical for applications such as inversion. Consequently, when it comes to selecting an RTM for applications that demand many simulations, the current pragmatic approach is to search for a good balance between acceptable accuracy and computational complexity [5].

Regardless whether an RTM is computationally expensive or not, an important requirement is the identification of the key input variables driving the spectral output and variables of lesser influence. By identifying variables of lesser influence, models can be greatly simplified, which facilitates practical applications such as inversion [26]. To achieve this, a sensitivity analysis is required. A sensitivity analysis can be simply defined as the process of determining the effect of changing the value of one or more input variables, and observing the effect that this has on the considered model’s output [22,27].

In general, sensitivity analysis methods can be categorized as either `local’ or `global’. Local sensitivity analysis (LSA) methods are often referred to as “one-factor-at-a-time”, because they involve changing one input variable at a time whilst holding all others at their default values, then measuring variation in the outputs. A drawback of LSA methods is that they are informative only at the default point where the calculation is executed and do not encompass the entire input variable space. Thus, LSA methods are inadequate for analysis of complex models having many variables, and that may be highly dimensional and/or non-linear [27,28,29]. Unlike LSA, global sensitivity analysis (GSA) explores the full input variable space [27]. Variance-based GSA methods aim to quantify the amount of variance that each input variable contributes to the unconditional variance (variance across all simulations) of the model’s output [29]. The approach is able to quantify the sensitivity to each of the model variables and their interactions, which cannot be done using LSA studies. GSA is thus required to identify the driving variables of an RTM. However, a GSA requires many simulations (depending on the number of variables, typically in the orders of 10 thousands), which makes it computationally demanding. It bears the consequence that so far GSA studies have been restricted to merely abstract, computationally cheap models, typically coming from the PROSPECT and SAIL families (e.g., [30,31,32,33]).

Identification and quantification of key variables in realistic vegetation an atmospheric radiative transfer interactions would enable a more sound understanding of atmospheric mechanisms with practical applications towards improved retrieval schemes. In order to bypass the computational burden of costly RTMs, in this paper a computationally effective technique is proposed by only using a limited number of computer code runs. The technique consists of approximating the RTM by a surrogate model with low computation time, also referred to as meta-model or emulator [34]. Once having a successful emulator developed, the idea is that it can replace the original RTM and be readily applied to perform sensitivity analysis. Accordingly, we posit here that emulators can provide an efficient mean for doing a GSA on large and expensive models.

Although successfully applied into GSA schemes for quantifying drivers of climate and environmental process-based models (e.g., [22,35,36]), so far this technique is largely unknown in remote sensing science. Only recently Rivera et al. (2015) [37] and Gómez-Dans et al. (2016) [38] started exploring the possibility of emulating RTMs and applying them for retrieval applications. The authors constructed RTM-like emulators with capabilities to deliver accurately and quasi-instantly spectral outputs such as reflectance, fluorescence and radiance. Advancing along this line, when emulators appear to approximate RTMs with high accuracy, consequently, it would become possible to apply GSA to any RTM, including computationally expensive RTMs such as MODTRAN.

In light of the above, the main objective of this study is to analyze the possibility of approximating the functioning of RTMs by emulators, in order implement these emulators into a GSA scheme for enabling analyzing computationally intensive RTMs into its driving input variables. This objective can be broken down into the following sub-objectives: (1) to analyze three machine learning regression algorithms on their accuracy to function as emulators for three RTMs with increasing complexity at the leaf, canopy and atmosphere scale, i.e., PROSPECT-4, PROSAIL and eventually the computationally expensive RTM MODTRAN5; (2) to apply the PROSPECT-4 and PROSAIL-like emulators into a GSA scheme and compare its results against GSA reference results as obtained from original PROSPECT-4 and PROSAIL simulations; and finally (3) to apply MODTRAN5-generated emulators for various MODTRAN5 outputs into the GSA scheme to enable identifying driving atmospheric variables along the 400–2500 nm spectral range.

2. Emulator Theory

Emulation is a technique used to estimate model simulations when the computer model under investigation is too computationally costly to be run enough times [34]. Various emulators based on statistical learning have already been developed in the climate and environmental modeling community for the last few years (e.g., [22,35,36,39,40,41,42,43]). These emulators have in common that they were developed from adaptive, flexible nonparametric regression algorithms, typically within the family of machine learning regression algorithms (MLRAs) [44]. Examples of MLRAs include Gaussian processes regression and neural networks [38,45]. The emulator uses a limited number of simulator runs, i.e., input-output pairs (corresponding to training samples), to infer the values of the complex simulator output given a yet-unseen input configuration. With respect to applying emulators into sensitivity analysis, specific advantages of the emulator are [22]:

The emulator is derived from a relatively small number of model runs covering a multidimensional input space, and is used to derive a computationally inexpensive and efficient analysis of all the sensitivity analysis computations regarding the original deterministic model code.
Once the emulator is built, it is not necessary to perform any additional runs with the model, regardless of how many analyses are required to assess the simulator’s behaviour. This is a very important advantage in the use of this method compared to other conventional GSA methods (e.g., those reviewed in [27] that typically require a fresh set of simulator runs for each analysis). Therefore, compared for instance to Monte Carlo-type methods, the approach requires far fewer model runs since the original code is only run to build the dataset to train the emulator.

Consequently, a two-step approach is pursued. The first step involves building a statistically-based representation (i.e., an emulator) of the simulator (i.e., an RTM) from a set of training data points derived from runs of the actual model under study. These training data pairs should ideally cover the multidimensional input space using a space-filling algorithm. Only one set of runs of the simulator is used to build the emulator. After the emulator is built, no more simulator runs are needed, no matter how many analyses are required of the simulator’s behavior. The second step uses the emulator built in the first step to compute the output that is otherwise generated by the simulator [34]. Note that building an emulator is essentially the same as building nonparametric regression models used for retrieval (e.g., [46,47]), but then in reversed order: whereas a retrieval model converts reflectance input into one or more output biophysical variables, an emulator converts input biophysical variables into output spectral data.

An important criterion regarding emulating an RTM is the ability to deliver a full spectrum, i.e., spectral output at multiple bands. This bears the consequence that the MLRA (i.e., emulator) should be able to generate multiple outputs in order to reconstruct a full spectral profile. However, the full, contiguous spectral profile typically consists above 2000 bands when binned to 1 nm resolution. This can take considerable training time, especially when using neural networks [37]. To enable the models to cope with large hyperspectral datasets (e.g., reflectance, transmittance, fluorescence), applying a principal component analysis (PCA) dimensionality reduction step [48] prior to training a model has been implemented. This step consists on projecting the RTM output spectra onto the top eigenvectors computed with PCA,

p ≪ B

, where p is the number of selected components (usually one typically takes enough components to ensure that at least 99.9% of variance is retained), and B is the number of original bands of the spectra. The regression methods are then trained to predict the PCA reduced spectra, and the inverse of the PCA transformation is subsequently applied to obtain the reconstructed output spectra. This dimensionality reduction strategy not only greatly speeds up the training phase but also makes the problem better conditioned [36,37,38]. An additional advantage of applying the PCA transformation is that it allows converting single-output MLRAs into multiple output methods, i.e., meaning that in principle any multivariate regression algorithm can function as emulator. By first applying a PCA to spectral data the hyperspectral data is reduced to a given number of components. By subsequently looping over the single component multiple models can be trained by a single-output MLRA. Afterwards the signal is again reconstructed. Although the iterative training takes some computational time, it is considerably faster than training a hyperspectral dataset band-by-band.

Based on the literature review above and an earlier comparison study conducted [37], the following three MLRAs potentially serve as powerful methods to function as emulators, being kernel ridge regression (KRR), neural networks (NN) and Gaussian processes regression (GPR). These methods have been amply described in earlier publications (e.g., [5,38,44,46,49,50]). The interested reader can find more details therein. The source codes of each of these methods, along with many other alternatives, are freely available in the simpleR toolbox [51]. The Automated Radiative Transfer Models Operator (ARTMO) Graphic User Interface (GUI) is a modular software package that includes these MLRA methods as well [47], and as described later, provides some easy-to-use tools for running RTMs and further processing.

3. Global Sensitivity Analysis Theory

Various variance-based GSA methods have been presented in the literature, including the Sobol’ method [52], the Fourier Amplitude Sensitivity Test (FAST) [53] and a modified version of the Sobol’ method proposed by [54]. This modification has been demonstrated to be effective in identifying the so-called Sobol’s sensitivity indices. These indices quantify both the main sensitivity effects (first-order effects:

S_{i}

, i.e., the contribution to the variance of the model output by each input variables) and total sensitivity effects (

S_{T i}

, i.e., the first-order effect plus interactions with other input variables) of input variables. This method has been applied here. A description according to Song et al. (2012) [55] is given below.

Formally, we have a model

y = f (x)

, where y is the model output, and

x = {[x_{1}, x_{2}, . . ., x_{k}]}^{⊤}

is the input feature vector. A variance decomposition of

f (\cdot)

as suggested by Sobol [52] is:

\begin{matrix} V (y) = \sum_{i = 1}^{k} V_{i} + \sum_{i = 1}^{k} \sum_{j = i + 1}^{k} V_{i j} . . . + V_{1, \dots, k}, \end{matrix}

(1)

where

x

is rescaled to a k-dimensional unit hypercube

Ω^{k}

,

Ω^{k}

=

{x | 0 \leq x_{i} \leq 1, i = 1, \dots, k}

;

V (y)

is the total unconditional variance;

V_{i}

is the partial variance or `main effect’ of

x_{i}

on y and given by the variance of the conditional expectation

V_{i} = V [E (y | x_{i})]

;

V_{i j}

is the joint impact of

x_{i}

and

x_{j}

on the total variance minus their first-order effects. Here, the first-order sensitivity index

S_{i}

and total effect sensitivity index

S_{T i}

are given as [27]:

\begin{matrix} S_{i} = \frac{V_{i}}{V (y)} = \frac{V [E (y | x_{i})]}{V (y)} \end{matrix}

(2)

\begin{matrix} S_{T i} = S_{i} + \sum_{j \neq i} S_{i j} + . . . = \frac{E [V (y | x_{\sim i})]}{V (y)}, \end{matrix}

(3)

where

x_{\sim i}

denotes variation in all input variables and

x_{i}

,

S_{i j}

is the contribution to the total variance by the interactions between variables. Following Saltelli et al. (2010) [54], to compute

S_{i}

and

S_{T i}

two independent input variable sampling matrices

P

and

Q

of dimensions

N \times k

are created, where N is the sample size and k is the number of input variables. Each row in matrices

P

and

Q

represents a possible value of

x

. The variable ranges in the matrices are scaled between 0 and 1. The Monte Carlo approximations for

V (y)

,

S_{i}

and

S_{T i}

are defined as follows [29,54]:

\begin{matrix} \hat{V} (y) = \frac{1}{N} \sum_{j = 1}^{N} {(f {(P)}_{j})}^{2} - {\hat{f}}_{0}^{2}, {\hat{f}}_{0} = \frac{1}{N} \sum_{j = 1}^{N} f {(P)}_{j}, \end{matrix}

(4)

and

\begin{matrix} {\hat{S}}_{i} = \frac{1}{N} \sum_{j = 1}^{N} \frac{f {(Q)}_{j} (f {(P_{Q}^{(i)})}_{j} - f {(P)}_{j})}{\hat{V} (y)}, \hat{S_{T i}} = \frac{1}{2 N} \sum_{j = 1}^{N} \frac{{(f {(P)}_{j} - f {(P_{Q}^{(i)})}_{j})}^{2}}{\hat{V} (y)}, \end{matrix}

(5)

where

\hat{\dots}

is the estimate;

{\hat{f}}_{0}

is the estimated value of the model’s output; we abused notation by defining

f (P)

as all outputs for row vectors in

P

;

P_{Q}^{(i)}

represents all columns from

P

except the i-th column which is from

Q

, using a radial sampling scheme [56]. Matrices are generated with a Latin hypercube sampling of size

N \times 2 k

where

P

and

Q

are the left and right half of this matrix, respectively [54]. In order to compute

S_{i}

and

S_{T i}

simultaneously, a scheme suggested by [57] was used which reduced the model runs to

N (k + 2)

.

4. Applied RTMs

As a proof-of-concept, the utility of emulators into the GSA framework will be demonstrated for three different RTMs at three scales with increasing degree of complexity, i.e., PROSPECT-4 at the leaf, PROSAIL at the canopy, and MODTRAN5 at the atmospheric scale. A brief description of the selected RTMs is given below.

4.1. PROSPECT-4

At the leaf scale, the PROSPECT-4 model [14] is currently one of the most widely used leaf optical models and is based on earlier PROSPECT versions. The model calculates leaf reflectance and transmittance as a function of its biochemistry and anatomical structure. It consists of four parameters, those being leaf structure (N), equivalent water thickness (

C w

), chlorophyll content (

C a b

) and dry matter content (

C d

). PROSPECT-4 simulates directional reflectance and transmittance over the solar spectrum from 400 to 2500 nm at the fine spectral resolution of 1 nm.

4.2. PROSAIL

At the canopy scale, SAIL is the most popular canopy RTM; also likely due to its availability and rather low number of input variables [15]. SAIL is based on a four-stream approximation of the RT equation, in which case one distinguishes two direct fluxes (incident solar flux and radiance in the viewing direction) and two diffuse fluxes (upward and downward hemispherical flux) [58]. SAIL inputs consist of leaf area index (

L A I

), leaf angle distribution (

L A D

), ratio of diffuse and direct radiation (

s k y l

), soil coefficient (soil coeff.), hot spot and sun-target-sensor geometry, i.e., solar/view zenith angle and relative azimuth angle (SZA, VZA and RAA, respectively). Spectral input consists of leaf reflectance and transmittance spectra and a soil reflectance spectrum. The leaf optical properties can come from a leaf RTM such as PROSPECT. By coupling PROSPECT with SAIL (PROSAIL), a leaf-canopy model is generated that allows analyzing the impact of leaf biochemical variables at the canopy scale [2]. PROSAIL generates hemispherical and bidirectional top-of-canopy (TOC) reflectance as output.

4.3. MODTRAN5

At the atmosphere scale, MODTRAN5 [24], the moderate resolution transmittance code, is one of the most widely used radiative transfer codes in the atmospheric community [59]. MODTRAN solves the radiative transfer equation by including the effects of molecular and particulate absorption/emission and scattering, surface reflections and emission, solar/lunar illumination, and spherical refraction. The absorption and scattering processes, considering the atmospheric extinction both for aerosols and molecules, are rigorously calculated using DISORT [60] and Correlated-k [61] algorithms. In addition, MODTRAN uses a spherically symmetric atmosphere, consisting of homogeneous layers, each one characterized by its temperature, pressure and concentration of atmospheric compounds. Here, the Snell’s law is used to calculate the curvature of light beams due to refraction between each atmospheric layer.

The large number of MODTRAN inputs (see [62] for a detailed description) allows the user to define the spectral range, illumination/viewing geometry and atmospheric state. A smaller subset of key input variables for general-purpose applications [63] (see Table 1) describe the main interactions between atmosphere and solar radiation in all spectral range. With respect to the geometric conditions, the key variables are

S Z A

,

V Z A

and

R A A

. Regarding the atmospheric state, the key input variables are reduced to two subsets: (1) Aerosol optical properties, described by means of the Aerosol Optical Thickness (AOT), the Angstrom parameter (AMS), and the asymmetry parameter of the Henyey-Greenstein scattering phase function (G); and (2) total columnar water vapor (CWV). In addition, the surface pressure, determined by the topographic elevation (ELEV), completes the set of key inputs variables. Depending on the application, other variables (e.g., vertical temperature profile, vertical aerosol distribution, concentration of CO

_{2}

) might be added. In our study, these variables are set to their default values.

MODTRAN spectral outputs include direct and diffuse transmittance, top of atmosphere (TOA) radiance fluxes, solar/lunar irradiance, horizontal fluxes, cooling rates, etc. These outputs extend from the UV into the far-infrared (0.2–10,000

μ

m) spectral range and are provided at a resolution as fine as 0.1 cm

^{- 1}

(0.01–0.1 nm in the VIS-SWIR spectral range). However, in most MODTRAN remote sensing application (e.g., [9,64,65,66]), these outputs are decomposed in a set of key atmospheric transfer functions [63,67] that allows to invert the atmospheric TOA radiance equation. The standard formulation under the Lambertian assumption is given by:

L_{T O A} = L_{0} + \frac{(E_{d i r} μ_{s} + E_{d i f}) (T_{d i f} + T_{d i r}) ρ}{π (1 - S ρ)}

(6)

where

L_{0}

is the path radiance (i.e., the radiance scattered by the atmosphere);

E_{d i r}

and

E_{d i f}

are respectively the direct and diffuse at-surface solar irradiance;

T_{d i r}

and

T_{d i f}

are respectively the surface-to-sensor direct and diffuse atmospheric transmittance; S is the spherical albedo, which accounts for the fraction of radiance that is back-scattered to the surface from the atmosphere;

μ_{s}

is the cosine of SZA; and ρ is the Lambertian surface reflectance. Hence, these six atmospheric transfer functions (

L_{0}

,

E_{d i r}

,

E_{d i f}

,

T_{d i r}

,

T_{d i f}

and S) are critical to understand the radiative transfer through the atmosphere and, thus, were selected as MODTRAN outputs to be decomposed into its driving variables through a GSA study.

5. Experimental Setup

In this section, we summarize the experimental setup for running the emulator-GSA experiments in the next section. We first discuss on the training and validation strategies for developing the emulators, and then on how we apply the GSA methodology. The large majority of the work was undertaken within the in-house developed ARTMO framework [68]. ARTMO consists of a suite of leaf and canopy RTMs, retrieval toolboxes, and recently a GSA toolbox [69] and an Emulator toolbox was added [37]. The Emulator toolbox is based on an earlier developed MLRA retrieval toolbox, which is equipped with a diversity of nonparametric regression models, mostly from the family of MLRAs [47]. In the Emulator toolbox several of those MLRAs can be trained by RTMs, but instead of delivering biophysical variables as outputs they are used as input in the regression model, and spectral data is generated as output. For this study, the Emulator toolbox (v. 1.04) has been expanded with various new utilities, including GPR, dimensionality reduction techniques and various goodness-of-fit indicators. The GSA toolbox (v. 1.04) has been made compatible with emulated RTMs. The toolboxes are downloadable at http://ipl.uv.es/artmo/.

5.1. Emulator Training and Validation

For each of the three RTMs, a look-up table (LUT) was first generated by means of Latin hypercupe sampling [70] within the RTM variable space with minimum and maximum boundaries as given in Table 1. A sampling size was sought for that is balancing between striving for high accuracies while keeping computational burden to the minimum. Earlier emulation studies suggested that using just 400 samples would suffice for GSA studies [22,71]. On the other hand, [37] demonstrated that a larger LUT may lead to the development of a more accurate emulator. Nevertheless, increasing input simulations goes at a computational cost, especially for computationally expensive RTMs such as MODTRAN. As an acceptable compromise a LUT size of 500 samples was built.

PROSPECT-4 and PROSAIL LUTs were generated with ARTMO. Their processing time took 12 and 33 s on a contemporary computer (Windows-64 OS, i7-4550 CPU 1.50 GHz, 8 GB RAM). The MODTRAN5 LUT was generated on another computer (Windows7-64 OS, i3-2100 CPU 3.10 GHz, 4 GB RAM) based on [63], and it took about 57 h to generate 500 samples ranging from 400 nm to 2500 nm at a spectral resolution of 5 cm

^{- 1}

(0.08–3 nm). In fact, in some cases MODTRAN5 was malfunctioning, i.e., delivering empty profiles, therefore somewhat more random simulations were generated to fill up these gaps. Also several simulations produced zero or no values in some bands specifically in the 1400 and 1900 nm water vapor absorption regions. To enable further computing, zero values were set close-to-zero, i.e.,

0.1 \times 10^{- 6}

, while the no values were filled up through interpolation. The LUTs were subsequently passed to the Emulator toolbox. To enable training the three selected MLRAs (KRR, NN and GPR), and in order to speed up the model development, a PCA was first applied, and the first 30 principal components were used for MLRA training and data projection. This way we better pose the problem and feed all regression models with the same amount of information. The number 30 was chosen after some initial tests with the PROSPECT-4 and PROSAIL datasets. Although for each of the 3 MLRAs the first 5 components explained already 99.96% of the variance, developing the regression models with less than 10 components led to inaccurate reconstructions in regions of strong absorption. In general, more robust reconstructions with more precise profiles in the strong absorption regions were achieved with having more components added in the regression algorithms. However, looping over more components slows down processing time and gain in accuracy is not always clear. Therefore, 30 components (i.e., 100% explained variance) were evaluated as an acceptable trade-off between accuracy and processing time.

Since emulators only produce an approximation of the original model, such an approximation introduces a source of uncertainty referred to as “code uncertainty” associated with the emulator [34], in such a way the sensitivity measures computed using the emulator are “uncertain”. Therefore, validation of the generated model is an important step in order to quantify the emulator’s degree of accuracy. To validate the accuracy of the emulator, part of the original data is kept aside as reference dataset. Various training/validation sampling design strategies are possible with the Emulator toolbox. In an attempt to keep processing fast, for this study only a single split was applied using 70% samples for training (350#) and the remaining 30% for validation (150#).

The next step involved validating the developed MLRA emulators. The toolbox analyzes one-by-one the validation accuracy of the emulators by calculating the root-mean-square-error (RMSE) difference between emulated spectra and validation RTM spectra. The processing time of model training and validation is also tracked. The toolbox further offers a few visualization tools to inspect the emulation accuracy along the spectral range, such as plotting the RTM vs. emulated output spectra and plotting the residuals, i.e., the difference between RTM-generated and emulated-generated spectra in absolute or relative terms. These plottings will be summarized into some general statistics, i.e., mean, standard deviation (SD), min–max boundaries and percentiles.

5.2. Applied GSA Strategy

The GSA experiments were conducted in the GSA toolbox where the Saltelli’s GSA method [54] has been implemented. For each GSA experiment,

(N (k + 2))

model simulations were run, where N is the sample size and equals to 1000, and k is the number of input variables and equals to 4 variables for PROSPECT-4, 11 for PROSAIL and 8 for MODTRAN5 (cf. Table 1). This led to 6000, 13,000 and 10,000 model runs, respectively. A Latin hypercube sampling to fill up the variable space according to the min–max boundaries given in Table 1 was used to sample the P and Q matrices. In order to gain insight into the actual influence of a given variable, both the main effect (

S_{i}

) and the interaction effects have to be considered. Therefore, only total order index results (

S_{T i}

) are shown. However, the sum of all

S_{T i}

can be greater than 1, due to the interactions, and its magnitude varies as a function of wavelength. Therefore, the

S_{T i}

results were normalized as a percentage.

Given that PROSPECT-4 and PROSAIL are relatively fast RTMs, the sensitivity analysis was first applied with the original models and then repeated with the 3 evaluated MLRA emulators (KRR, NN, GPR). The original GSA results serve as reference to validate the GSA results obtained by the MLRA emulators. For brevity, only PROSPECT-4 leaf reflectance and PROSAIL TOC bidirectional reflectance results are shown, but in principle the analysis can be likewise applied to leaf transmittance or canopy hemispherical reflectance. Also for brevity, for each of the six MODTRAN5 outputs only GSA results obtained by the most accurate emulator are shown. For each analysis processing time is given.

6. Results

Results are organized as follows: first the ability of the MLRA models to function as a accurate emulator are evaluated, then their performances into the GSA scheme are given.

6.1. Validation of Emulators Accuracy

To answer the question whether the statistical regression models can function as accurate emulators, first general goodness-of-fit statistics of emulated spectra against the RTM validation spectra are provided. Table 2 displays the RMSE for the LUTs coming from the three RTMs that were used to train the MLRAs. The normalized RMSE (NRMSE) indicates that relative errors fell well below 0.5%, but significant differences across the three MLRAs and RTMs can be observed. KRR performed generally poorest in accuracy with exception of

E_{d i r}

. NN was validated as most accurate for the majority of LUTs with exceptions for a few MODTRAN5 outputs. However, optimizing its architecture goes at a computational cost, because the number of hidden neurons from 3 to 40 are internally tuned, as well as the learning rate, momentum term, regularization constants and initialization of the weights. NN training took several times longer than the two other MLRAs, usually between 3 and 14 min. Note that the PCA transformation considerably improved the computational efficiency of the training phase; without PCA it took up to a few hours to develop the NN model (results not shown). GPR seems to be more promising: the model consistently delivers high accuracies and is trained relatively fast (it typically took between 1 and 2 min time to train). The cost of GPR is higher than for KRR because the kernel function used includes a length-scale per input feature to be optimized and we developed a GP model per output. KRR provides extremely fast training for this low-size problems, but this goes at the expense of a somewhat poorer accuracy. For all MLRAs, training the models is the only computational cost; once trained, outputs were generated and validated quasi-instantly (less than 0.01 s). When comparing the relative errors across the three RTMs, from the NRMSE results it can be observed that PROSPECT-4 was most accurately emulated, closely followed by most of the MODTRAN5 outputs. The reason for their excellent performances is due to the low number of input variables, 4 and 8, respectively. Emulating PROSAIL was achieved with relative errors of about 10 times poorer than PROSPECT-4, which can be attributed to the higher number of input variables considered.

6.2. Validation of Emulated Spectral Profiles and Residuals

Because the RMSE validation statistics only give general information, a closer look across the emulated spectra is required. Summarizing profiles (mean, SD, min–max) of the RTM validation spectra and corresponding emulated spectra have been plotted. From these two spectral datasets, the absolute and relative residuals were derived and plotted. These figures will help to interpret the GSA results.

6.2.1. PROSPECT-4

To start with PROSPECT-4 (Figure 1 Top), for all three MLRA emulators the mean of the emulated spectra (blue line) is precisely on top of the mean of the original PROSPECT-4 spectra (red line underneath), which suggests a close match. It can be observed that for each of the three MLRAs the SD of the emulated spectra are seamlessly overlapping with that of PROSPECT-4 validation spectra. Only at the lower boundaries close to zero, i.e., in the visible and around 2000–2500 nm some small anomalies took place. Also the min–max boundaries are mostly similar. Only the GPR emulator does not fully reach the high RTM dynamic range in the 2000–2500 nm region.

The specific regions where anomalies occur may be better visualized in the residuals (middle panels) and relative residuals (bottom panels). Excellent residuals can be noted for NN and GPR; the mean and SD keep stable around zero. The percentiles show only fluctuations in the visible and SWIR regions related to regions with strong absorbances. KRR also keeps the mean residuals close to zero, but is characterized with considerably larger fluctuations in the visible and at the atmospheric water bands around 1900 nm, and to a lesser extent at 1400 nm. These anomalies are magnified when looking at the relative residuals, where the SD can be above 10%. Note that the larger residuals take place in the strong absorption regions where true or emulated reflectance values can approach close to zero, thereby causing large relative residuals.

6.2.2. PROSAIL

The same RTM vs emulator and residual graphs are produced for PROSAIL (Figure 2). The top panels indicate the accuracy of the reconstructed PROSAIL reflectance relative to the PROSAIL validation dataset. Mean signatures are seamlessly overlapping, although in some regions the mean RTM signature becomes partly visible next to the emulated one. Also the overlapping SD suggests that the large majority of emulated spectra closely match the original RTM spectra. Further, it can be observed at the min–max boundaries that they are not perfectly overlapping. At the max boundary, PROSAIL simulations reaches generally further than the emulators for KRR and GPR, but reverse for NN, while the emulator delivers slightly lower values at the min boundary for KRR and NN.

Regarding the residuals, although the mean residual is around zero, the broader SD and percentiles indicate that emulating PROSAIL bidirectional reflectance is not always perfectly matching the original RTM. Similar to the observations for PROSPECT-4, NN and GPR keep consistent performances, with absolute inaccuracies lower than 0.1. For KRR, anomalies can be larger, as expressed by a broader SD and some pronounced extreme cases shown by the percentile values. The relative residuals reflect larger differences, with peaks again especially in the visible and also at the atmospheric water bands centered around 1400 and 1900 nm and towards the SWIR. This is especially evident for KRR. Comparison of the residuals against the original spectral shapes (top panel) reveals that reflectance in these absorption regions is close to zero, making relative differences expanding rapidly.

6.2.3. MODTRAN5

For each of the six MODTRAN5 outputs only the most accurate emulator (see Table 2) is used. The same type of graphs are shown in Figure 3, with the overview signatures in the top panel. Contrarily to leaf and canopy simulations, atmospheric profiles are not smooth but spiky due to some atmospheric absorption features. In all cases, the mean emulated data overlap the mean RTM data suggesting that emulators are generally capable of reconstructing the MODTRAN5 profiles with great precision. The SD and min–max boundaries indicate again that the emulators are able to reconstruct the same output information as MODTRAN5 for most of the simulated spectra.

When inspecting the residuals, however, larger errors can be observed. Overall, absolute residuals are close to zero, as suggested by the mean and SD, though not for all MODTRAN5 outputs emulators perform equally stable. Particularly

E_{d i r}

and

T_{d i r}

generated somewhat more fluctuating mean residuals. In absolute terms, there are generally larger anomalies in the visible than in the SWIR spectral region (with exception of absorption regions) due to the generally higher values.

Also noteworthy are the outliers, which are especially apparent in the percentiles. These peaks are most pronounced at the deep atmospheric water vapor absorption regions centered at 1400 and 1900 nm. In these regions, as observed in the top panels, profiles are reducing to zero. Other spiky outliers are also observed thanks to the fine spectral sampling. They are related to narrow atmospheric absorption features from different molecular compounds such as O

_{2}

.

The strong atmospheric water vapor absorption regions are most noteworthy in the relative residuals. In these regions the relative residuals exceed out-of-range. This can be attributed to the very close-to-zero (

0.1 \times 10^{- 6}

) values for various bands in these regions, as well as the noisy behavior, which were mot followed by the emulator. In remote sensing applications these regions are typically discarded because of very noisy, but we decided to show them for sake of completeness.

Apart from these extreme regions, there are also remarkable general difference among the six MODTRAN5 outputs. Whereas

E_{d i f}

,

T_{d i f}

, S and

L_{0}

emulators generated relative residuals that can exceed 1000% for

E_{d i r}

and

T_{d i r}

these relative residuals kept 10% (Note scale of the Y-axis on the relative residuals plots). On the one hand, the first outputs,

E_{d i f}

,

T_{d i f}

, S and

L_{0}

, are atmospheric functions highly influenced by aerosols and molecular scattering and multiple scattering. On the other hand, direct fluxes

E_{d i r}

and transmittance

T_{d i r}

are not particularly affected by atmospheric scattering. Therefore, it seems that multiple scattering atmospheric mechanisms at high spectral resolution are more difficult to emulate.

6.3. Global Sensitivity Analysis Results

6.3.1. PROSPECT-4

For PROSPECT-4, total sensitivity (

S_{T i}

) results along the 400–2500 nm spectral range are displayed in Figure 4. The reference

S_{T i}

result is displayed in the top left panel. Because the relative contributions sum up to 100%, the role of each input variable along the spectral range can be easily inspected. For instance,

S_{T i}

results clarify that chlorophyll content (

C a b

) drive leaf reflectance in the visible part, and the structural variables N and dry matter content (

C m

) play a role across the whole spectral range. Leaf water content (

C w

) only governs leaf reflectance from around 1200 nm onwards.

When comparing the reference PROSPECT-4

S_{T i}

results against the

S_{T i}

results of the emulators (KRR, NN & GPR) their similarity is remarkable. Although all three emulators deliver very similar patterns, some small differences can be noted. Particularly for KRR some anomalies can be observed. While in the residuals small imperfections were observed in the visible and around 1900 nm, this is also reflected in the

S_{T i}

results, with tiny anomalies of

C w

in the visible and

C a b

around 1900 nm. Further differences between PROSPECT-4 and the NN and GPR emulators are more subtle. For instance, it can be noted that the structural variable N of the emulators has a somewhat more pronounced small peak in the blue than PROSPECT-4. Overall, the similarity of the figures affirm using emulators into GSA. Moreover, the processing time indicates that the emulated models generated the

S_{T i}

results about twice as fast as the original PROSPECT-4 model (which is already running fast).

6.3.2. PROSAIL

By coupling the leaf PROSPECT-4 model with the SAIL model, i.e., PROSAIL, then both leaf and canopy input variables govern the variability of TOC bidirectional reflectance.

S_{T i}

results are displayed in Figure 5 with original PROSAIL results in the top left panel. It can be noted that the prime driving variable is the canopy structural variables

L A I

and

L A D

.

L A I

quantifies leaf density and is a key variable in many surface and climate studies (e.g., [2]). The reference PROSAIL

S_{T i}

results indicate that

L A I

alone can explain up to 50% of the total variability (i.e., with interactions), especially in the SWIR (1400–3000 nm). Its dominance can be explained by two factors. Firstly,

L A I

controls the amount of leaf elements and consequently controls the amount of spectrally-distinct soil reflectance propagating through the canopy (in case of low

L A I

). And secondly, the introduction of leaf elements enables the interactions with leaf optical properties, i.e., a higher

L A I

implies more light absorption due to the properties of leaf elements and scattering effects.

L A D

controls the orientation of the leaves [15].

L A D

, like

L A I

, regulates to an extent whether bidirectional TOC reflectance originates from vegetation only (e.g., in case of erectophile leaves;

L A D

: 90

^{\circ}

) or is also mixed with soil reflectance (e.g., in case of planophile leaves;

L A D

: 0

^{\circ}

). Other drivers are soil coefficient (regulates the contribution of wet and dry soil portions) and to a lesser extent geometry (solar and view zenith angle). Also the leaf optical properties play a prominent role. The patterns of the leaf biochemical variables are similar as those for PROSPECT-4 alone, but their relative importance is suppressed by the dominance of the aforementioned key canopy variables.

When subsequently comparing reference PROSAIL

S_{T i}

results against those of the emulated

S_{T i}

results, their similarity is encouraging again. The patterns originating from the emulators match closely the results of the original PROSAIL, and are generated about three times faster. Nevertheless, likewise for PROSPECT-4, the KRR emulator delivers various small anomalies compared to the reference PROSAIL model. Similar as in case of PROSPECT-4, small contributions of

C w

and

C a b

emerge at meaningless spectral regions, e.g., some tiny

C a b

portion in the SWIR and tiny

C w

portion in the visible. Regarding the SAIL variables, the variables

S Z A

,

R A A

and also

s k y l

are given a bit too much weight. Conversely, NN and GPR emulators performed very much alike and similar to the reference PROSAIL GSA results, although GPR slightly underestimates the role of the geometry variables. A closer inspection suggests that NN preserves a slightly higher precision than GPR, e.g., no

C w

residual in the blue. Overall, these results are encouraging for using emulators in GSA experiments.

6.3.3. MODTRAN5

Evidence with PROSPECT-4 and PROSAIL suggest that emulators can perfectly replace RTMs into a GSA scheme. Likewise, the most accurate emulators for the six MODTRAN5 output functions (see Table 2) were implemented into GSA. Its processing time did not take more than a few minutes. The total sensitivity index (

S_{T i}

) allowed to decompose the MODTRAN5 outputs into its driving input variables, as shown in Figure 6.

Although a careful interpretation is required in the deep atmospheric water absorption regions, the overall trends of the atmospheric drivers across the 400–2500 nm spectral range emerged continuously in general. In all these figures, the relative importance of the atmospheric variables became apparent. Aerosol optical variables, i.e.,

A O T

,

A M S

and G seems to be key TOA radiance drivers, but some points must be outlined.

Firstly, direct components,

E_{d i r}

and

T_{d i r}

, are not influenced by variations in the G variable. Since G modules the anisotropic degree of the scattering phase function, it does not affect those atmospheric components that are not affected by the scattering process. Instead, the direct flux and transmittance are influenced by the light absorption caused by aerosols (driven by the

A O T

and

A M S

) and molecules, e.g., in the water absorption regions through

C W V

. In fact, the amount of gasses passed through the atmospheric optical path is directly linked with

S Z A

and

V Z A

. This can clearly be seen in the continuum across the wavelength range and within molecular absorption (e.g., O

_{3}

at 550 nm and O

_{2}

A at 760 nm).

Thus, direct components present a stronger dependence on the illumination and viewing geometry as well as a stronger spectral variation in

A O T

and

A M S

as compared to the analogous diffuse components,

E_{d i f}

and

T_{d i f}

. Additionally, direct components are also stronger affected by the

E L E V

variable. However, due to the large wavelength range covered in this study, from 400 to 2500 nm, this cannot be appreciated in Figure 6. The

E L E V

variable especially affects molecular absorption bands, since changes in surface elevation causes variations in the atmospheric optical path length, therefore modifying absorption bands depth. Nevertheless, in the special case of water vapor, simulations were originally done by disentangling the

C W V

variable from the

E L E V

variable to avoid imposed correlations between these two parameters.

Secondly, the diffuse components,

E_{d i f}

and

T_{d i f}

, which account for the multiple scattering produced in the atmosphere, are basically driven by the aerosol load. In a physical sense, the

A M S

exponent is inversely related to the average size in the aerosols, being the smaller the particles the larger the exponent. In a mathematical sense, the Angström law describes the spectral dependence of

A O T

, relating the

A O T

at a fixed wavelength (in this study at 550 nm) to the

A O T

at the rest of wavelengths. Thus, in TOA radiance the effect of varying the

A M S

exponent is similar to vary the slope of the TOA radiance spectrum around a fixed wavelength (550 nm). This trend can be observed in each of the atmospheric output functions, where at 550 nm AMS has no influence, and its impact increases as it moves away from this wavelength. Therefore, while the impact of

A O T

and G in

S_{T i}

decreases with wavelength,

A M S

increases as it moves away from the reference wavelength.

Thirdly, the spherical albedo term, S, accounts for the back-scattering from the atmosphere to the surface. Driving variables in S are those regarding the scattering effect, i.e., the aerosols optical variables (

A O T

,

A M S

and G), as it was expected. Finally, as the S is diffusely transmitted, it has no dependence on the illumination nor acquisition geometry.

Finally, the path radiance

L_{0}

, describes the scattering produced by the atmosphere before light arrives to the surface. Contrarily to the absorption, scattering is the dominant effect in

L_{0}

. Consequently, the G variable has a strong influence. However, the importance of the observation and illumination geometry, which determines the optical path length, seems to be equal or even more important. Not only on

S Z A

and

V Z A

, but also on the relative azimuth angle (

R A A

) between the sensor and the illumination source is absolutely crucial to describe

L_{0}

.

Apart from the general trend, the figures are characterized by local tiny spikes. This especially holds in the atmospheric water vapor absorption regions. The emulator showed unacceptably poor relative residuals in these areas so these spikes is treated as noise and should be ignored.

7. Discussion

Let us now discuss on the most important messages conveyed in this work, and the implications on further studies for remote sensing data analysis, sensor characterization, scene generation, and model inversion.

7.1. Interpreting Emulator Results

Emulation is a technique used to estimate model simulations when the model under investigation is too computationally burdensome to be run many times [34]. Following on initial successful experiments that MLRAs can function as emulators to approximate the behavior of physically-based RTMs [37,38], here the concept of emulating RTMs has been further explored. Specifically, emulators approximating popular RTMs at the leaf, canopy and atmosphere scale were used as input into a GSA scheme. Variance-based GSA methods are excellent tools to identify driving variables in process-based models such as RTMs, but they require many simulations. When models are computationally costly, this method can become cumbersome, and so far only fast RTMs were subject to GSA studies. The excellent emulation-GSA results as compared to the original RTM-GSA results for PROSPECT-4 and PROSAIL suggest that emulators can open many practical applications. It demands for a closer inspection of their properties.

When it comes to emulating RTM spectral ouptuts, an important aspect involves dealing with a large amount of multi-outputs, e.g., up to 2101 wavebands to construct the 400–2500 nm profiles at 1 nm. It made that existing emulation packages (e.g., BACCO (Bayesian Analysis of Computer Code Outputs) or GEM-SA (Gaussian Emulation Machine for Sensitivity Analysis) [34,72,73]), as they process only one output, are impractical for RTM sensitivity analysis. To cope with such a high degree of multiple-output, we had implemented a PCA dimensionality reduction procedure into ARTMO’s Emulator toolbox and linked it with the GSA toolbox. The Emulator toolbox is essentially nothing more than a collection of multivariate regression algorithms that builds a relationship between RTM input variables and spectral output. Any multivariate regression algorithm can potentially function as emulator. However, the regression methods should be able to implement nonlinear and smooth (regularized) functions, in order to deal with complex, possibly ill-posed input-output relations. Earlier experiments on emulation revealed that nonparametric regression methods limited to linear transformations, such as partial least square regression (PLSR), are unable to reconstruct spectral signals with sufficient accuracy [37], while nonlinear regression methods in the families of NNs and kernel-based MLRAs have proven to be powerful adaptive estimators (e.g., [5,37,46]).

In this study three potentially powerful MLRAs (KRR, NN and GPR) were evaluated to act as emulators on their accuracy accuracy, computational burden and correlation of the residuals. Goodness-of-fit validation results indicated that relative errors (i.e., NRMSE) were below 0.5%. Nevertheless, although trained extremely fast (about 2 s), KRR performed mostly worse, partly due to the low number of hyperparemeters used in defining the kernel function. NN performed generally superior but this method requires a considerably longer training phase, here up to 14 min, which may be unacceptably long when dealing with computationally cheap RTMs. Consequently, for applications based on PROSPECT and SAIL, most likely it does not pay off to emulate them with NN, e.g., here the original PROSPECT and PROSAIL GSA procedures were completed faster than the NN training plus NN-GSA running. In turn, a training phase in the order of minutes may not be trivial when it comes to costly RTMs. For instance, one single MODTRAN5 simulation at finest spectral resolution of 0.1 cm

^{- 1}

may take over 10 min on a contemporary computer. GPR may function as an sensible compromise; it is relatively fast in training (here less than 2 min), fast in testing, and with accuracies generally close to those of NN.

Moreover, GPR possesses additional properties which were not explored in this paper and that explain its first choice for emulation (e.g., [34,74]). GPR can give some information about the relative relevance of the inputs, it can provide associated uncertainty intervals (referred to as code uncertainty of the emulator) for the predictions, and the Jacobian and Hessians of the prediction function can be explicitly derived [50,75]. The uncertainty estimations can be of interest as they provide a reliable indication of the trustworthiness of the generated outputs. For instance, in regions where the RTM is not evaluated, we are uncertain about what the true simulator would introduce, and thus uncertainties are larger than within the training variable space. Accordingly, code uncertainty can be reduced by increasing our knowledge of the RTM, i.e., by increasing the training sample size [76]. In fact, the uncertainty intervals can function as a quality check, e.g., accepting only emulations within a confidence threshold. Nevertheless, it should be noted that preparatory experiments with PROSPECT and SAIL revealed that these uncertainties are generally so small that they were hardly apparent (results not shown), as has also been recently observed by [38].

However, this does not mean that emulators are free from limitations. Emulators are meta-models, i.e., model of a model, implying that they only approximate the functioning of the simulator, i.e., the RTM. It bears the following consequences: (1) the emulator can only be as good as the simulator, i.e., imperfections in the simulator are mirrored to the emulator; (2) the accuracy of the emulator depend on the type of machine learning regression and the number and representativity of the training samples; and (3) emulators uncertainty increases when moving beyond the boundaries of the training dataset so accurate extrapolation is compromised. All in all, a careful training and validation of emulators is strictly necessary, and associated uncertainty intervals may help deducing whether the generated output is within reasonable and physically meaningful levels.

7.2. Interpreting Sensitivity Analysis Results

Machine learning emulators provided accurate estimations and reproduced the underlying mechanistic principles encoded in the RTMs, and allowed for intensive simulations and remote sensing applications. By applying Saltelli’s GSA scheme, we studied the total sensitivity provided by the emulators compared against RTM reference results for PROSPECT-4 and PROSAIL. Results demonstrated that the emulator-GSA setup delivers identical sensitivity patterns as the original RTM GSA scheme. Nevertheless, given that these two RTMs are computationally cheap, the gain in speed to deliver the sensitivity results is not very large. If computationally cheap, obviously, applying GSA to the original RTMs is always preferred. Conversely, the use of emulators can become a powerful tool to bypass the computational burden of expensive RTMs. For instance, the emulator-GSA setup allows for the first time decomposing MODTRAN5 outputs into its driving input variables across the spectral range. Here, for multiple MODTRAN5 atmospheric transfer functions (i.e., MODTRAN5 outputs) the emulator-GSA setup quantified the key variables within reasonable time, i.e., on the order of minutes. To put in perspective, conducting the same analysis with original MODTRAN5 simulations would have taken about 47 days on the same computer.

The MODTRAN5 emulation-GSA

S_{T i}

results showed that the relative importance of each input variable in the atmospheric transfer functions were physically meaningful: e.g., effect of water vapor in these absorption bands, effect of scattering variables in diffuse irradiance and transmittance. However, unlike leaf and canopy surface reflectance, the spectrum of the atmospheric transfer functions is affected by many narrow atmospheric and solar absorption bands on top of a smooth curve. The analysis of the residuals indicates that the emulators performed well in the smoother regions of the spectrum, while they showed large relative errors within the narrow atmospheric absorption regions with values close to zero. Most noticeable are the dominant water vapor absorption regions around 1400 and 1900 nm, which are typically discarded in further processing. In fact, we had repeated the emulation-GSA analysis but then without those absorption regions (results not shown). It led to the same sensitivity patterns but then with gaps where the data was removed. However, the relevant information that these regions are largely driven by CWV (especially for

E_{d i r}

and

T_{d i r}

) would then be omitted. Despite high uncertainties, we therefore preferred to include the whole spectrum for sake of completeness. To analyze atmospheric drivers further, we consider as further work to dedicate different emulators (with dedicated principal components) for distinctive absorption regions to better capture high frequency spectral details. Another future sensitivity analysis would include more MODTRAN input variables. While here only the most relevant atmospheric variables are studied, it is of interest to disentangle the role of other variables such as single scattering albedo or even the variables that allow modelling the aerosol vertical distribution as a bi-modal, log-normal or gamma function [77].

For practical remote sensing applications, the PROSAIL and MODTRAN

S_{T i}

results suggest that not all input variables are equally important in shaping spectral output. A few variables possess hardly any sensitivity, which implies they can be safely kept to default values e.g.,

s k y l

and

R A A

for PROSAIL. In the particular case of MODTRAN5, the driving input variables must be separately studied for each atmospheric function. On the whole,

E L E V

and also

R A A

(apart from

L_{0}

) have hardly impact. However, more importantly, some variables play only a role in specific absorption regions, such as

C W V

. This suggests that

C W V

can be kept as constant e.g., when interested in analyzing scattering effects in the blue region. In fact,

E L E V

and sun-target-sensor geometry (

V Z A

,

S Z A

and

R A A

) are typically known and thus kept fixed in applications. Similarly, by discarding insensitive variables from the sampling scheme it is possible to simplify the computational load and inversion problem for mapping applications (e.g., as in [26,78]).

Finally, it must be noted that the RTM input variables are independent, i.e., no physical links between them have been introduced, although in reality they can be correlated. For instance, in case of MODTRAN5 the impact of the

E L E V

variable appears negligible for all atmospheric functions. However, in reality surface elevation drives the atmospheric optical path, and

E L E V

is therefore to some extent correlated with the amount of the atmospheric absorbers (gaseous components) such as water vapor. Similarly, in reality leaf and canopy variables are to some extent correlated, e.g.,

C a b

occurs with

C w

, and the existence leaf optical properties automatically imply the existence of leaves thus

L A I

(e.g., [79]). Accordingly, when aiming for practical applications from sensitivity studies it is recommendable to filter out input combinations that are unlikely to occur in reality.

7.3. New Processing Opportunities with Emulators

Probably the most significant advantage of emulators is the tremendous gain in processing speed. Each of the tested emulators delivers outputs quasi-instantly, which implies boosting in processing speed in the orders of tens to ten thousands depending on the speed of the original RTM. For instance, for the computationally intensive atmospheric RTM MODTRAN, the gain is about 130,000. At the same time, they hardly take memory space, only a few model coefficients are stored for prediction. Consequently, emulators can become an attractive technique for a diversity of remote sensing applications beyond GSA studies, which are briefly outlined in what follows.

Numerical inversion. Iterative optimization is a classical technique to invert RTMs in RS [80,81]. The optimization consists in minimizing a cost function, which estimates the difference between measured and estimated variables by successive input variable iteration. The iterative optimization algorithms are computationally demanding and hence time-consuming when large remotely-sensed datasets are inverted. This method has never been a viable option for computationally intensive RTMs, and even for computationally cheap RTMs the method is slow and not appealing. Making use of pre-generated LUTs is preferred in inversion schemes (see [5] for a review). For instance, canopy RTM LUTs are used in the MODIS LAI retrieval products [82] and also atmospheric correction algorithms make use of LUT-based inversion methods (e.g., [83,84,85,86,87]). Accordingly, when replacing the original RTMs by their emulated counterparts then numerical inversion can again become an attractive alternative; numerical inversion using a RTM-like emulator can go very fast. It would bypass the need to invest in large and heavy LUTs. To explore this research line further, a so-called Numerical Inversion Toolbox is currently under construction that will make use of emulators.

Scene generation. Forthcoming optical satellite missions increasingly make use of end-to-end mission performance simulators (E2ES) to generate simulated scenes as would be detected by the optical sensor (e.g., [9,88]). A drawback is that scene rendering can take a long processing time, especially when based on computationally expensive models such as SCOPE or MODTRAN [89]. Accordingly, when replacing the original RTMs by their emulated counterparts then scenes can be generated rapidly. Within the framework of ESA’s forthcoming 8th Earth Explorer FLEX, currently it is being investigated whether some tedious parts of the FLEX-E2ES scene generation can be replaced by emulators.

Data assimilation schemes. Data assimilation schemes where multiple data sources and models are coupled are more commonly applied in climate and Earth system studies. Often they use advanced data fusion and processing techniques, e.g., Ensemble Kalman Filter and Markov Chain Monte Carlo methods (e.g., [90,91,92]) that can take considerable computational time and so the limiting factor in the processing chain. To mitigate the processing bottlenecks in assimilation schemes, it has been proposed to replace parts of deterministic input-output processing chain by emulators. Typically the most computationally burdensome parts, although in does not have be one emulator attempting to emulate the complete system, but rather replacing tedious parts by multiple smaller but highly accurate emulators. Efforts in this direction are undertaken by [38,93].

Improved emulators. From a pure machine learning point of view, emulators boil down to developing regression algorithms that should be accurate enough when training with relatively few-to-moderate available observations. The scenario calls for the concept of regularization, which is tightly related to invariance encoding and incorporation of prior knowledge and the definition of sensible cost functions. Many opportunities appear here to improve the performance of emulators: one could think of including multiple pieces of information in the regression algorithm with multimodal/multiresolution regression, e.g., by combining RTMs for the same problem, to accommodate spatial or temporal relations in the emulation [44,94], and to implement better dimensionality reduction techniques beyond linear PCA to deal with the multi-output problem [95]. Apart from these improvements in the regression algorithm, we raise here the important issue of assessment of the emulator function, e.g., by looking at the Jacobian and Hessian of the transformation [38,96], Bayesian sensitivity analysis [34,97], as well as developing emulators that may deal with coupled RTMs and transformations of coefficients [50].

8. Conclusions

Emulators are statistical constructs that approximate the functioning of process-based models such as RTMs. They provide great savings in memory and tremendous gains in processing speed, while yielding competitive accuracies in reconstructing RTM outputs. This emulation technique opens many new research and practical opportunities, not in the least enabling to convert computationally expensive RTMs into computationally fast surrogate models.

As a proof-of-concept, we demonstrated the use of emulated RTMs into a global sensitivity analysis (GSA) study. We analyzed three MLRAs (NN, KRR and GPR) on their ability to function as an emulator to approximate spectral outputs of the leaf optical RTM PROSPECT-4, the canopy RTM PROSAIL and the computational expense atmospheric RTM MODTRAN5. A PCA preprocessing step was introduced to speed up processing and enable deconstructing the full spectral profile. Particularly NN and GPR proved to act as accurate emulators, with normalized root-mean-squared errors below 0.2% for the majority of RTM validation datasets. The PROSPECT-4 and PROSAIL-like emulators were subsequently implemented into a GSA scheme. Because of delivering accurate RTM reflectance outputs, NN and GPR emulators delivered GSA total order sensitivity (

S_{T i}

) results (including interactions) identical as the reference

S_{T i}

results coming from original PROSPECT-4 and PROSAIL simulations.

These two studies successfully demonstrated the benefit and reliability of using emulators into GSA studies. As a next step, the most accurate MODTRAN5-like emulators for six MODTRAN5 atmospheric transfer outputs (

E_{d i f}

,

E_{d i r}

,

T_{d i f}

,

T_{d i r}

, S and

L_{0}

) were implemented into the GSA scheme. The analysis along the 400–2500 nm spectral range did not take more than a few minutes. Key driving atmospheric variables were identified. Particularly the aerosol optical variables are mostly driving the entire spectral range. While

A O T

and

A M S

govern all atmospheric outputs, the asymmetry coefficient of scattering G, only impacts those outputs diffusely transmitted. On the contrary, molecular compounds such as

C W V

, which only affects specific absorption regions, present a stronger effect on those outputs that are directly transmitted.

Altogether, these GSA studies at the leaf, canopy and atmosphere scale demonstrated that the emulator technique, in which both consistent physical assumptions and statistical learning live together, can open many new opportunities in the use of advanced RTMs for practical applications.

Acknowledgments

This work was partially supported by the European Space Agency under project ’FLEX-Bridge Study’ (ESA contract RFP IPL-PEO/FF/lf/14.687), the ERDF and the Spanish Ministry of Economy and Competitiveness under project TIN2015-64210-R. Gustau Camps-Valls wants to thank the European Research Council (ERC) funding under the ERC-CoG-2014 SEDAL under grant agreement 647423. We also thank José Gómez-Dans for valuable discussions on emulation. We thank the four reviewers for their valuable suggestions.

Author Contributions

J.V., N.S. and J.V. designed and carried out the experiments, and wrote the manuscript. J.P.R. developed the Emulator toolbox. J.M.-M., G.C.-V. and J.M. contributed to the drafting of the manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

References

Zhang, Y.; Rossow, W.; Lacis, A.; Oinas, V.; Mishchenko, M. Calculation of radiative fluxes from the surface to top of atmosphere based on ISCCP and other global data sets: Refinements of the radiative transfer model and the input data. J. Geophys. Res. D Atmos. 2004, 109. [Google Scholar] [CrossRef]
Jacquemoud, S.; Verhoef, W.; Baret, F.; Bacour, C.; Zarco-Tejada, P.; Asner, G.; François, C.; Ustin, S. PROSPECT + SAIL models: A review of use for vegetation characterization. Remote Sens. Environ. 2009, 113, S56–S66. [Google Scholar] [CrossRef]
Deiveegan, M.; Balaji, C.; Venkateshan, S. A polarized microwave radiative transfer model for passive remote sensing. Atmos. Res. 2008, 88, 277–293. [Google Scholar] [CrossRef]
Van der Tol, C.; Berry, J.A.; Campbell, P.K.E.; Rascher, U. Models of fluorescence and photosynthesis for interpreting measurements of solar-induced chlorophyll fluorescence. J. Geophys. Res. Biogeosci. 2014, 119, 2312–2327. [Google Scholar] [CrossRef] [PubMed]
Verrelst, J.; Camps-Valls, G.; Muñoz-Marí, J.; Rivera, J.; Veroustraete, F.; Clevers, J.; Moreno, J. Optical remote sensing and the retrieval of terrestrial vegetation bio-geophysical properties—A review. ISPRS J. Photogramm. Remote Sens. 2015, 108, 273–290. [Google Scholar] [CrossRef]
Verhoef, W.; Bach, H. Simulation of Sentinel-3 images by four-stream surface-atmosphere radiative transfer modeling in the optical and thermal domains. Remote Sens. Environ. 2012, 120, 197–207. [Google Scholar] [CrossRef]
Meharrar, K.; Bachari, N. Modelling of radiative transfer of natural surfaces in the solar radiation spectrum: Development of a satellite data simulator (SDDS). Int. J. Remote Sens. 2014, 35, 1199–1216. [Google Scholar] [CrossRef]
Segl, K.; Guanter, L.; Rogass, C.; Kuester, T.; Roessner, S.; Kaufmann, H.; Sang, B.; Mogulsky, V.; Hofer, S. EeteSThe EnMAP end–to–end simulation tool. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2012, 5, 522–530. [Google Scholar] [CrossRef]
Vicent, J.; Sabater, N.; Tenjo, C.; Acarreta, J.; Manzano, M.; Rivera, J.; Jurado, P.; Franco, R.; Alonso, L.; Verrelst, J.; et al. FLEX End-to-End Mission Performance Simulator. IEEE Trans. Geosci. Remote Sens. 2016, 54, 4215–4223. [Google Scholar] [CrossRef]
Widlowski, J.L.; Taberner, M.; Pinty, B.; Bruniquel-Pinel, V.; Disney, M.; Fernandes, R.; Gastellu-Etchegorry, J.P.; Gobron, N.; Kuusk, A.; Lavergne, T.; et al. Third Radiation Transfer Model Intercomparison (RAMI) exercise: Documenting progress in canopy reflectance models. J. Geophys. Res. Atmos. 2007, 112. [Google Scholar] [CrossRef] [Green Version]
Baret, F.; Buis, S. Estimating canopy characteristics from remote sensing observations. Review of methods and associated problems. In Advances in Land Remote Sensing: System, Modeling, Inversion and Application; Liang, S., Ed.; Springer: Berlin/Heidelberg, Germany, 2008; pp. 171–200. [Google Scholar]
Widlowski, J.L.; Pinty, B.; Lopatka, M.; Atzberger, C.; Buzica, D.; Chelle, M.; Disney, M.; Gastellu-Etchegorry, J.P.; Gerboles, M.; Gobron, N.; et al. The fourth radiation transfer model intercomparison (RAMI-IV): Proficiency testing of canopy reflectance models with ISO-13528. J. Geophys. Res. Atmos. 2013, 118, 6869–6890. [Google Scholar] [CrossRef]
Widlowski, J.L.; Mio, C.; Disney, M.; Adams, J.; Andredakis, I.; Atzberger, C.; Brennan, J.; Busetto, L.; Chelle, M.; Ceccherini, G.; et al. The fourth phase of the radiative transfer model intercomparison (RAMI) exercise: Actual canopy scenarios and conformity testing. Remote Sens. Environ. 2015, 169, 418–437. [Google Scholar] [CrossRef]
Feret, J.B.; François, C.; Asner, G.P.; Gitelson, A.A.; Martin, R.E.; Bidel, L.P.R.; Ustin, S.L.; le Maire, G.; Jacquemoud, S. PROSPECT-4 and 5: Advances in the leaf optical properties model separating photosynthetic pigments. Remote Sens. Environ. 2008, 112, 3030–3043. [Google Scholar] [CrossRef]
Verhoef, W. Light scattering by leaf layers with application to canopy reflectance modeling: The SAIL model. Remote Sens. Environ. 1984, 16, 125–141. [Google Scholar] [CrossRef]
Vermote, E.; Tanré, D.; Deuzé, J.; Herman, M.; Morcrette, J.J. Second simulation of the satellite signal in the solar spectrum, 6S: An overview. IEEE Trans. Geosci. Remote Sens. 1997, 35, 675–686. [Google Scholar] [CrossRef]
Lipton, A.; Moncet, J.L.; Boukabara, S.A.; Uymin, G.; Quinn, K. Fast and accurate radiative transfer in the microwave with optimum spectral sampling. IEEE Trans. Geosci. Remote Sens. 2009, 47, 1909–1917. [Google Scholar] [CrossRef]
Govaerts, Y.M.; Verstraete, M.M. Raytran: A Monte Carlo ray-tracing model to compute light scattering in three-dimensional heterogeneous media. IEEE Trans. Geosci. Remote Sens. 1998, 36, 493–505. [Google Scholar] [CrossRef]
North, P. Three-dimensional forest light interaction model using a Monte Carlo method. IEEE Trans. Geosci. Remote Sens. 1996, 34, 946–956. [Google Scholar] [CrossRef]
Lewis, P. Three-dimensional plant modelling for remote sensing simulation studies using the Botanical Plant Modelling System. Agronomie 1999, 19, 185–210. [Google Scholar] [CrossRef]
Gastellu-Etchegorry, J.; Demarez, V.; Pinel, V.; Zagolski, F. Modeling radiative transfer in heterogeneous 3-D vegetation canopies. Remote Sens. Environ. 1996, 58, 131–156. [Google Scholar] [CrossRef]
Petropoulos, G.; Wooster, M.; Carlson, T.; Kennedy, M.; Scholze, M. A global Bayesian sensitivity analysis of the 1D SimSphere soil vegetation atmospheric transfer (SVAT) model using Gaussian model emulation. Ecol. Model. 2009, 220, 2427–2440. [Google Scholar] [CrossRef]
Van Der Tol, C.; Verhoef, W.; Timmermans, J.; Verhoef, A.; Su, Z. An integrated model of soil-canopy spectral radiances, photosynthesis, fluorescence, temperature and energy balance. Biogeosciences 2009, 6, 3109–3129. [Google Scholar] [CrossRef]
Berk, A.; Anderson, G.; Acharya, P.; Bernstein, L.; Muratov, L.; Lee, J.; Fox, M.; Adler-Golden, S.; Chetwynd, J.; Hoke, M.; et al. MODTRAN^TM5: 2006 update. In Proceedings of the Algorithms and Technologies for Multispectral, Hyperspectral, and Ultraspectral Imagery XII, Orlando, FL, USA, 17 April 2006; Volume 6233.
España, M.; Baret, F.; Aries, F.; Chelle, M.; Andrieu, B.; Prévot, L. Modeling maize canopy 3D architecture: Application to reflectance simulation. Ecol. Model. 1999, 122, 25–43. [Google Scholar] [CrossRef]
Mousivand, A.; Menenti, M.; Gorte, B.; Verhoef, W. Multi-temporal, multi-sensor retrieval of terrestrial vegetation properties from spectral-directional radiometric data. Remote Sens. Environ. 2015, 158, 311–330. [Google Scholar] [CrossRef]
Saltelli, A.; Ratto, M.; Andres, T.; Campolongo, F.; Cariboni, J.; Gatelli, D.; Saisana, M.; Tarantola, S. Global Sensitivity Analysis: The Primer; John Wiley & Sons: Hoboken, NJ, USA, 2008. [Google Scholar]
Yang, J. Convergence and uncertainty analyses in Monte-Carlo based sensitivity analysis. Environ. Model. Softw. 2011, 26, 444–457. [Google Scholar] [CrossRef]
Nossent, J.; Elsen, P.; Bauwens, W. Sobol’sensitivity analysis of a complex environmental model. Environ. Model. Softw. 2011, 26, 1515–1525. [Google Scholar] [CrossRef]
Bowyer, P.; Danson, F. Sensitivity of spectral reflectance to variation in live fuel moisture content at leaf and canopy level. Remote Sens. Environ. 2004, 92, 297–308. [Google Scholar] [CrossRef]
Stuckens, J.; Verstraeten, W.W.; Delalieux, S.; Swennen, R.; Coppin, P. A dorsiventral leaf radiative transfer model: Development, validation and improved model inversion techniques. Remote Sens. Environ. 2009, 113, 2560–2573. [Google Scholar] [CrossRef]
Mousivand, A.; Menenti, M.; Gorte, B.; Verhoef, W. Global sensitivity analysis of the spectral radiance of a soil–vegetation system. Remote Sens. Environ. 2014, 145, 131–144. [Google Scholar] [CrossRef]
Verrelst, J.; Rivera, J.; Van Der Tol, C.; Magnani, F.; Mohammed, G.; Moreno, J. Global sensitivity analysis of the SCOPE model: What drives simulated canopy-leaving sun-induced fluorescence? Remote Sens. Environ. 2015, 166, 8–21. [Google Scholar] [CrossRef]
O’Hagan, A. Bayesian analysis of computer code outputs: A tutorial. Reliab. Eng. Syst. Saf. 2006, 91, 1290–1300. [Google Scholar] [CrossRef]
Rohmer, J.; Foerster, E. Global sensitivity analysis of large-scale numerical landslide models based on Gaussian-Process meta-modeling. Comput. Geosci. 2011, 37, 917–927. [Google Scholar] [CrossRef]
Bounceur, N.; Crucifix, M.; Wilkinson, R. Global sensitivity analysis of the climate–vegetation system to astronomical forcing: An emulator-based approach. Earth Syst. Dyn. Discuss. 2014, 5, 901–943. [Google Scholar] [CrossRef]
Rivera, J.P.; Verrelst, J.; Gómez-Dans, J.; Muñoz Marí, J.; Moreno, J.; Camps-Valls, G. An Emulator Toolbox to Approximate Radiative Transfer Models with Statistical Learning. Remote Sens. 2015, 7, 9347. [Google Scholar] [CrossRef]
Gómez-Dans, J.L.; Lewis, P.E.; Disney, M. Efficient Emulation of Radiative Transfer Codes Using Gaussian Processes and Application to Land Surface Parameter Inferences. Remote Sens. 2016, 8, 119. [Google Scholar] [CrossRef]
Carnevale, C.; Finzi, G.; Guariso, G.; Pisoni, E.; Volta, M. Surrogate models to compute optimal air quality planning policies at a regional scale. Environ. Model. Softw. 2012, 34, 44–50. [Google Scholar] [CrossRef]
Villa-Vialaneix, N.; Follador, M.; Ratto, M.; Leip, A. A comparison of eight metamodeling techniques for the simulation of N₂O fluxes and N leaching from corn crops. Environ. Model. Softw. 2012, 34, 51–66. [Google Scholar] [CrossRef] [Green Version]
Castelletti, A.; Galelli, S.; Ratto, M.; Soncini-Sessa, R.; Young, P. A general framework for dynamic emulation modelling in environmental problems. Environ. Model. Softw. 2012, 34, 5–18. [Google Scholar] [CrossRef]
Lee, L.; Pringle, K.; Reddington, C.; Mann, G.; Stier, P.; Spracklen, D.; Pierce, J.; Carslaw, K. The magnitude and causes of uncertainty in global model simulations of cloud condensation nuclei. Atmos. Chem. Phys. 2013, 13, 8879–8914. [Google Scholar] [CrossRef]
Ireland, G.; Petropoulos, G.; Carlson, T.; Purdy, S. Addressing the ability of a land biosphere model to predict key biophysical vegetation characterisation parameters with Global Sensitivity Analysis. Environ. Model. Softw. 2015, 65, 94–107. [Google Scholar] [CrossRef]
Camps-Valls, G.; Bruzzone, L. (Eds.) Kernel Methods for Remote Sensing Data Analysis; Wiley & Sons Ltd.: Chichester, UK, 2009.
Razavi, S.; Tolson, B.A.; Burn, D.H. Numerical assessment of metamodelling strategies in computationally intensive optimization. Environ. Model. Softw. 2012, 34, 67–86. [Google Scholar] [CrossRef]
Verrelst, J.; Muñoz, J.; Alonso, L.; Delegido, J.; Rivera, J.; Camps-Valls, G.; Moreno, J. Machine learning regression algorithms for biophysical parameter retrieval: Opportunities for Sentinel-2 and -3. Remote Sens. Environ. 2012, 118, 127–139. [Google Scholar] [CrossRef]
Rivera, J.; Verrelst, J.; Delegido, J.; Veroustraete, F.; Moreno, J. On the semi-automatic retrieval of biophysical parameters based on spectral index optimization. Remote Sens. 2014, 6, 4924–4951. [Google Scholar] [CrossRef]
Wold, S.; Esbensen, K.; Geladi, P. Principal component analysis. Chemom. Intell. Lab. Syst. 1987, 2, 37–52. [Google Scholar] [CrossRef]
Camps-Valls, G.; Muñoz-Marí, J.; Gómez-Chova, L.; Guanter, L.; Calbet, X. Nonlinear statistical retrieval of atmospheric profiles from MetOp-IASI and MTG-IRS infrared sounding data. IEEE Trans. Geosci. Remote Sens. 2012, 50, 1759–1769. [Google Scholar] [CrossRef]
Camps-Valls, G.; Verrelst, J.; Muñoz-Marí, J.; Laparra, V.; Mateo-Jiménez, F.; Gómez-Dans, J. A Survey on Gaussian Processes for Earth Observation Data Analysis. IEEE Geosci. Remote Sens. Mag. 2016, 4, 58–78. [Google Scholar] [CrossRef]
Camps-Valls, G.; Gómez-Chova, L.; Muñoz-Marí, J.; Lázaro-Gredilla, M.; Verrelst, J. SimpleR: A Simple Educational Matlab Toolbox for Statistical Regression, 2013. Available online: http://isp.uv.es/softregression.html (accessed on 18 August 2016).
Sobol’, I.M. On sensitivity estimation for nonlinear mathematical models. Mat. Model. 1990, 2, 112–118. [Google Scholar]
McRae, G.J.; Tilden, J.W.; Seinfeld, J.H. Global sensitivity analysis—A computational implementation of the Fourier amplitude sensitivity test (FAST). Comput. Chem. Eng. 1982, 6, 15–25. [Google Scholar] [CrossRef]
Saltelli, A.; Annoni, P.; Azzini, I.; Campolongo, F.; Ratto, M.; Tarantola, S. Variance based sensitivity analysis of model output. Design and estimator for the total sensitivity index. Comput. Phys. Commun. 2010, 181, 259–270. [Google Scholar] [CrossRef]
Song, X.; Bryan, B.A.; Paul, K.I.; Zhao, G. Variance-based sensitivity analysis of a forest growth model. Ecol. Model. 2012, 247, 135–143. [Google Scholar] [CrossRef]
Saltelli, A.; Annoni, P. How to avoid a perfunctory sensitivity analysis. Environ. Model. Softw. 2010, 25, 1508–1517. [Google Scholar] [CrossRef]
Saltelli, A. Making best use of model evaluations to compute sensitivity indices. Comput. Phys. Commun. 2002, 145, 280–297. [Google Scholar] [CrossRef]
Verhoef, W.; Jia, L.; Xiao, Q.; Su, Z. Unified optical-thermal four-stream radiative transfer theory for homogeneous vegetation canopies. IEEE Trans. Geosci. Remote Sens. 2007, 45, 1808–1822. [Google Scholar] [CrossRef]
Wang, P.; Liu, K.Y.; Cwik, T.; Green, R. MODTRAN on supercomputers and parallel computers. Parallel Comput. 2002, 28, 53–64. [Google Scholar] [CrossRef]
Stamnes, K.; Tsay, S.C.; Wiscombe, W.; Jayaweera, K. Numerically stable algorithm for discrete-ordinate-method radiative transfer in multiple scattering and emitting layered media. Appl. Opt. 1988, 27, 2502–2509. [Google Scholar] [CrossRef] [PubMed]
Goody, R.; West, R.; Chen, L.; Crisp, D. The correlated-k method for radiation calculations in nonhomogeneous atmospheres. J. Quant. Spectrosc. Radiat. Transf. 1989, 42, 539–550. [Google Scholar] [CrossRef]
Berk, A.; Anderson, G.; Acharya, P.; Shettle, E. MODTRAN 5.2.1 User’s Manual; Spectral Science Inc.: Burlingtonm, MA, USA, 2011. [Google Scholar]
Guanter, L.; Richter, R.; Kaufmann, H. On the application of the MODTRAN4 atmospheric radiative transfer code to optical Remote Sensing. Int. J. Remote Sens. 2009, 30, 1407–1424. [Google Scholar] [CrossRef]
Richter, R.; Schläpfer, D. Geo-atmospheric processing of airborne imaging spectrometry data. Part 2: Atmospheric/topographic correction. Int. J. Remote Sens. 2002, 23, 2631–2649. [Google Scholar] [CrossRef]
Cooley, T.; Anderson, G.; Felde, G.; Hoke, M.; Ratkowski, A.; Chetwynd, J.; Gardner, J.; Adler-Golden, S.; Matthew, M.; Berk, A.; et al. FLAASH, a MODTRAN4-based atmospheric correction algorithm, its applications and validation. In IEEE International Geoscience and Remote Sensing Symposium; IEEE: Piscataway, NJ, USA, 2002; Volume 3, pp. 1414–1418. [Google Scholar]
Guanter, L. New Algorithms for Atmospheric Correction and Retrieval of Biophysical Parameters in Earth Observation: Application to ENVISAT/MERIS Data, 2006. Available online: http://hdl.handle.net/10803/9877 (accessed on 18 August 2016).
Berk, A.; Acharya, P.; Anderson, G.; Gossage, B. Recent developments in the MODTRAN atmospheric model and implications for hyperspectral compensation. In 2009 IEEE International Geoscience and Remote Sensing Symposium; IEEE: Piscataway, NJ, USA, 2009; Volume 2, pp. II262–II265. [Google Scholar]
Verrelst, J.; Romijn, E.; Kooistra, L. Mapping vegetation density in a heterogeneous river floodplain ecosystem using pointable CHRIS/PROBA data. Remote Sens. 2012, 4, 2866–2889. [Google Scholar] [CrossRef]
Verrelst, J.; Rivera, J.; Moreno, J. ARTMO’s Global Sensitivity Analysis (GSA) toolbox to quantify driving variables of leaf and canopy radiative transfer models. EARSeL eProc. 2015, 14, 1–11. [Google Scholar]
McKay, M.; Beckman, R.; Conover, W. Comparison of three methods for selecting values of input variables in the analysis of output from a computer code. Technometrics 1979, 21, 239–245. [Google Scholar]
Conti, S.; O’Hagan, A. Bayesian emulation of complex multi-output and dynamic computer models. J. Stat. Plan. Inference 2010, 140, 640–651. [Google Scholar] [CrossRef]
Kennedy, M.; O’Hagan, A. Bayesian calibration of computer models. J. R. Stat. Soc. Ser. B Stat. Methodol. 2001, 63, 425–450. [Google Scholar] [CrossRef]
Hankin, R.K. Introducing BACCO, an R package for Bayesian analysis of computer code output. J. Stat. Softw. 2005, 14, 1–21. [Google Scholar] [CrossRef]
Conti, S.; Gosling, J.; Oakley, J.; O’Hagan, A. Gaussian process emulation of dynamic computer codes. Biometrika 2009, 96, 663–676. [Google Scholar] [CrossRef]
Verrelst, J.; Alonso, L.; Camps-Valls, G.; Delegido, J.; Moreno, J. Retrieval of vegetation biophysical parameters using Gaussian process techniques. IEEE Trans. Geosci. Remote Sens. 2012, 50, 1832–1843. [Google Scholar] [CrossRef]
Verrelst, J.; Rivera, J.; Moreno, J.; Camps-Valls, G. Gaussian processes uncertainty estimates in experimental Sentinel-2 LAI and leaf chlorophyll content retrieval. ISPRS J. Photogramm. Remote Sens. 2013, 86, 157–167. [Google Scholar] [CrossRef]
Hansen, J.E.; Travis, L.D. Light scattering in planetary atmospheres. Space Sci. Rev. 1974, 16, 527–610. [Google Scholar] [CrossRef]
Verrelst, J.; van der Tol, C.; Magnani, F.; Sabater, N.; Rivera, J.; Mohammed, G.; Moreno, J. Evaluating the predictive power of sun-induced chlorophyll fluorescence to estimate net photosynthesis of vegetation canopies: A SCOPE modeling study. Remote Sens. Environ. 2016, 176, 139–151. [Google Scholar] [CrossRef]
Ollinger, S. Sources of variability in canopy reflectance and the convergent properties of plants. New Phytol. 2011, 189, 375–394. [Google Scholar] [CrossRef] [PubMed]
Jacquemoud, S.; Baret, F.; Andrieu, B.; Danson, F.; Jaggard, K. Extraction of vegetation biophysical parameters by inversion of the PROSPECT + SAIL models on sugar beet canopy reflectance data. Application to TM and AVIRIS sensors. Remote Sens. Environ. 1995, 52, 163–172. [Google Scholar] [CrossRef]
Kuusk, A. Monitoring of vegetation parameters on large areas by the inversion of a canopy reflectance model. Int. J. Remote Sens. 1998, 19, 2893–2905. [Google Scholar] [CrossRef]
Myneni, R.; Hoffman, S.; Knyazikhin, Y.; Privette, J.; Glassy, J.; Tian, Y.; Wang, Y.; Song, X.; Zhang, Y.; Smith, G.; et al. Global products of vegetation leaf area and fraction absorbed PAR from year one of MODIS data. Remote Sens. Environ. 2002, 83, 214–231. [Google Scholar] [CrossRef]
Richter, R. A spatially adaptive fast atmospheric correction algorithm. Int. J. Remote Sens. 1996, 17, 1201–1214. [Google Scholar] [CrossRef]
Thome, K.; Palluconi, F.; Takashima, T.; Masuda, K. Atmospheric correction of ASTER. IEEE Trans. Geosci. Remote Sens. 1998, 36, 1199–1211. [Google Scholar] [CrossRef]
Kotchenova, S.; Vermote, E.; Matarrese, R.; Klemm, F., Jr. Validation of a vector version of the 6S radiative transfer code for atmospheric correction of satellite data. Part I: Path radiance. Appl. Opt. 2006, 45, 6762–6774. [Google Scholar] [CrossRef] [PubMed]
Guanter, L.; Alonso, L.; Moreno, J. CHRIS Proba Atmospheric Correction Module. Available online: http://www.brockmann-consult.de/beam-wiki/download/attachments/32964611/chrisbox-atmosphericcorrectionatbd-2.0.pdf (accessed on 2 June 2012).
Hagolle, O.; Huc, M.; Pascual, D.; Dedieu, G. A multi-temporal and multi-spectral method to estimate aerosol optical thickness over land, for the atmospheric correction of FormoSat-2, LandSat, VENμS and Sentinel-2 images. Remote Sens. 2015, 7, 2668–2691. [Google Scholar] [CrossRef]
Segl, K.; Guanter, L.; Gascon, F.; Kuester, T.; Rogass, C.; Mielke, C. S2eteS: An End-to-End Modeling Tool for the Simulation of Sentinel-2 Image Products. IEEE Trans. Geosci. Remote. Sens. 2015, 53, 5560–5571. [Google Scholar] [CrossRef]
Rivera, J.; Sabater, N.; Tenjo, C.; Vicent, J.; Moreno, J. Synthetic scene simulator for hyperspectral spaceborne passive optical sensors. Application to ESA’s FLEX/Sentinel-3 tandem mission. In Proceedings of the WHISPERS—6th Workshop on Hyperspectral Image and Signal: Evolution in Remote Sensing, Lausanne, Switzerland, 24–27 June 2014.
Quaife, T.; Lewis, P.; De Kauwe, M.; Williams, M.; Law, B.; Disney, M.; Bowyer, P. Assimilating canopy reflectance data into an ecosystem model with an Ensemble Kalman Filter. Remote Sens. Environ. 2008, 112, 1347–1364. [Google Scholar] [CrossRef]
Peng, C.; Guiot, J.; Wu, H.; Jiang, H.; Luo, Y. Integrating models with data in ecology and palaeoecology: Advances towards a model-data fusion approach. Ecol. Lett. 2011, 14, 522–536. [Google Scholar] [CrossRef] [PubMed]
Luo, Y.; Ogle, K.; Tucker, C.; Fei, S.; Gao, C.; LaDeau, S.; Clark, J.; Schimel, D. Ecological forecasting and data assimilation in a data-rich era. Ecol. Appl. 2011, 21, 1429–1442. [Google Scholar] [CrossRef] [PubMed]
Lewis, P.; Gómez-Dans, J.; Kaminski, T.; Settle, J.; Quaife, T.; Gobron, N.; Styles, J.; Berger, M. An Earth Observation Land Data Assimilation System (EO-LDAS). Remote Sens. Environ. 2012, 120, 219–235. [Google Scholar] [CrossRef]
Salcedo-Sanz, S.; Casanova-Mateo, C.; Muñoz Marí, J.; Camps-Valls, G. Prediction of daily global solar irradiation using temporal Gaussian processes. IEEE Geosci. Remote Sens. Lett. 2014, 11, 1936–1940. [Google Scholar] [CrossRef]
Arenas-García, J.; Petersen, K.B.; Camps-Valls, G.; Hansen, L.K. Kernel Multivariate Analysis Framework for Supervised Subspace Learning. IEEE Signal Process. Mag. 2013, 30, 16–29. [Google Scholar] [CrossRef]
Camps-Valls, G.; Jung, M.; Ichii, K.; Papale, D.; Tramontana, G.; Bodesheim, P.; Schwalm, C.; Zscheischler, J.; Mahecha, M.; Reichstein, M. Ranking drivers of global carbon and energy fluxes over land. In Proceedings of the 2015 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Milan, Italy, 26–31 July 2015; pp. 4416–4419.
Oakley, J.; O’Hagan, A. Probabilistic sensitivity analysis of complex models: A Bayesian approach. J. R. Stat. Soc. Ser. B Stat. Methodol. 2004, 66, 751–769. [Google Scholar] [CrossRef]

Figure 1. Top panels: general statistics (mean, standard deviation (SD), min–max) for PROSPECT-4 validation and corresponding emulated spectra (Top). The SD and min–max areas are presented as semi transparent so that their shapes for both datasets can be distinguished (red: RTM, blue: emulator). Consequently, the blended purple (SD) and pink (min–max) colors indicate the degree of RTM vs. emulated data overlapping. Middle and bottom panels: general statistics (mean, SD, 97.5 percentile) of derived residuals (Middle) and relative residuals (Bottom).

Figure 2. Top panels: general statistics (mean, standard deviation (SD), min–max) for PROSAIL validation and corresponding emulated spectra (Top). The SD and min–max boundaries areas are presented as semi transparent so that their shapes for both datasets can be distinguished (red: RTM, blue: emulator). Consequently, the blended purple (SD) and pink (min–max) colors indicate the degree of RTM vs emulated data overlapping. Middle and bottom panels: general statistics (mean, SD, 97.5 percentile) of derived residuals (Middle) and relative residuals (Bottom).

Figure 3. Top panels: general statistics (mean, standard deviation (SD), min–max) for MODTRAN5 validation and corresponding emulated spectra of best performing emulator (Top). The SD and min–max boundaries areas are presented as semi transparent so that their shapes for both dataset can be distinguished (red: RTM, blue: emulator). Consequently, the blended purple (SD) and pink (min–max) colors indicate the degree of RTM vs emulated data overlapping. Middle and bottom panels: general statistics (mean, SD, 97.5 percentile) of derived residuals (Middle) and relative residuals (Bottom).

Figure 4. PROSPECT-4 GSA

S_{T i}

reference results (a) and PROSPECT-4 (122 s) GSA

S_{T i}

results as generated by the emulators KRR (56 s) (b), NN (55 s) (c) and GPR (58 s) (d). On top: the used RTM or emulator and processing time. s: seconds.

Figure 4. PROSPECT-4 GSA

S_{T i}

reference results (a) and PROSPECT-4 (122 s) GSA

S_{T i}

results as generated by the emulators KRR (56 s) (b), NN (55 s) (c) and GPR (58 s) (d). On top: the used RTM or emulator and processing time. s: seconds.

Figure 5. PROSAIL GSA

S_{T i}

reference results (a) and PROSAIL (279 s) GSA

S_{T i}

results as generated by the emulators KRR (73 s) (b), NN (80 s) (c) and GPR (79 s) (d). On top: the used RTM or emulator and processing time. s: seconds.

Figure 5. PROSAIL GSA

S_{T i}

reference results (a) and PROSAIL (279 s) GSA

S_{T i}

results as generated by the emulators KRR (73 s) (b), NN (80 s) (c) and GPR (79 s) (d). On top: the used RTM or emulator and processing time. s: seconds.

Figure 6. MODTRAN5 GSA

S_{T i}

results according to best performing emulator (see Table 2) for

E_{d i f}

(GPR: 121 s) (a),

E_{d i r}

(KRR: 101 s) (b),

T_{d i f}

(NN: 157 s) (c),

T_{d i r}

(NN: 151 s) (d), S (NN: 150 s) (e) and

L_{0}

(GPR: 166 s) (f) outputs. On top: the used emulator and processing time. s: seconds.

Figure 6. MODTRAN5 GSA

S_{T i}

results according to best performing emulator (see Table 2) for

E_{d i f}

(GPR: 121 s) (a),

E_{d i r}

(KRR: 101 s) (b),

T_{d i f}

(NN: 157 s) (c),

T_{d i r}

(NN: 151 s) (d), S (NN: 150 s) (e) and

L_{0}

(GPR: 166 s) (f) outputs. On top: the used emulator and processing time. s: seconds.

Table 1. Range and distribution of input variables used to establish the PROSPECT-4, PROSAIL (PROSPECT4 + SAIL) and MODTRAN5 look-up tables. SAIL hot spot variable was fixed to 0.01.

**Table 1.** Range and distribution of input variables used to establish the PROSPECT-4, PROSAIL (PROSPECT4 + SAIL) and MODTRAN5 look-up tables. SAIL hot spot variable was fixed to 0.01.
Model Variales		Units	Minimum	Maximum
Leaf variables: PROSPECT-4
N	Leaf structure index	unitless	1.0	4.0
$C_{w}$	Leaf water content	cm	0.001	0.05
$L C C$	Leaf chlorophyll content	$μ$ g/cm $^{2}$	0.1	100
$C_{m}$	Leaf dry matter content	g/cm $^{2}$	0.001	0.05
Canopy variables: SAIL
$L A I$	Leaf area index	m $^{2}$ /m $^{2}$	0.1	10
$L A D$	Average leaf angle	∘	0	90
$s k y l$	Diffuse incoming solar radiation	fraction	0	100
$S o i l$ $c o e f f .$	Soil scaling factor	unitless	0	1
$S Z A$	Sun zenith angle	∘	0	60
$V Z A$	View zenith angle	∘	0	55
$R A A$	(Sun-sensor) relative azimuth angle	∘	0	180
Atmospheric variables: MODTRAN5
$V Z A$	Visual zenith angle	∘	0	55
$S Z A$	Solar zenith angle	∘	0	60
$R A A$	Relative azimuth angle	∘	0	180
$E L E V$	Elevation	km	0	2
$A O T$	Aerosol optical thickness	-	0	0.4
$A M S$	Angstrom Coefficient	-	0.5	3
G	Asymmetry parameter	-	−1	1
$C W V$	Columnar water vapor	g/cm $^{2}$	0	2

Table 2. MLRA emulators validation results (RMSE, NRMSE) and training processing time (s: seconds) for 500 PROSPECT-4 (leaf reflectance), PROSAIL (canopy bidirectional reflectance) and MODTRAN outputs (

L_{0}

,

E_{d i r}

,

E_{d i f}

,

T_{d i r}

,

T_{d i f}

and S). Best validated MLRA emulator per dataset is bolded.

**Table 2.** MLRA emulators validation results (RMSE, NRMSE) and training processing time (s: seconds) for 500 PROSPECT-4 (leaf reflectance), PROSAIL (canopy bidirectional reflectance) and MODTRAN outputs ( $L_{0}$ , $E_{d i r}$ , $E_{d i f}$ , $T_{d i r}$ , $T_{d i f}$ and S). Best validated MLRA emulator per dataset is bolded.
MLRA	RMSE	NRMSE (%)	CPU (s)
PROSPECT-4
KRR	0.19	0.03	1
NN	0.08	0.01	730
GPR	0.16	0.01	90
PROSAIL
KRR	1.79	0.24	1
NN	0.85	0.11	208
GPR	0.90	0.11	100
MODTRAN5 $E_{d i f}$
KRR	403.55	0.05	1
NN	1137.30	0.12	157
GPR	263.54	0.03	82
MODTRAN5 $E_{d i r}$
KRR	738.47	0.04	1
NN	3117.30	0.18	878
GPR	1372.90	0.08	123
MODTRAN5 $T_{d i f}$
KRR	0.53	0.07	1
NN	0.24	0.04	321
GPR	0.25	0.04	80
MODTRAN5 $T_{d i r}$
KRR	1.11	0.07	1
NN	0.34	0.05	280
GPR	0.48	0.03	80
MODTRAN5 S
KRR	0.29	0.04	1
NN	0.21	0.03	271
GPR	0.30	0.04	78
MODTRAN5 $L_{0}$
KRR	1489.90	0.14	1
NN	1086.40	0.43	186
GPR	729.10	0.07	94

© 2016 by the authors; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC-BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Verrelst, J.; Sabater, N.; Rivera, J.P.; Muñoz-Marí, J.; Vicent, J.; Camps-Valls, G.; Moreno, J. Emulation of Leaf, Canopy and Atmosphere Radiative Transfer Models for Fast Global Sensitivity Analysis. Remote Sens. 2016, 8, 673. https://doi.org/10.3390/rs8080673

AMA Style

Verrelst J, Sabater N, Rivera JP, Muñoz-Marí J, Vicent J, Camps-Valls G, Moreno J. Emulation of Leaf, Canopy and Atmosphere Radiative Transfer Models for Fast Global Sensitivity Analysis. Remote Sensing. 2016; 8(8):673. https://doi.org/10.3390/rs8080673

Chicago/Turabian Style

Verrelst, Jochem, Neus Sabater, Juan Pablo Rivera, Jordi Muñoz-Marí, Jorge Vicent, Gustau Camps-Valls, and José Moreno. 2016. "Emulation of Leaf, Canopy and Atmosphere Radiative Transfer Models for Fast Global Sensitivity Analysis" Remote Sensing 8, no. 8: 673. https://doi.org/10.3390/rs8080673

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Emulation of Leaf, Canopy and Atmosphere Radiative Transfer Models for Fast Global Sensitivity Analysis

Abstract

1. Introduction

2. Emulator Theory

3. Global Sensitivity Analysis Theory

4. Applied RTMs

4.1. PROSPECT-4

4.2. PROSAIL

4.3. MODTRAN5

5. Experimental Setup

5.1. Emulator Training and Validation

5.2. Applied GSA Strategy

6. Results

6.1. Validation of Emulators Accuracy

6.2. Validation of Emulated Spectral Profiles and Residuals

6.2.1. PROSPECT-4

6.2.2. PROSAIL

6.2.3. MODTRAN5

6.3. Global Sensitivity Analysis Results

6.3.1. PROSPECT-4

6.3.2. PROSAIL

6.3.3. MODTRAN5

7. Discussion

7.1. Interpreting Emulator Results

7.2. Interpreting Sensitivity Analysis Results

7.3. New Processing Opportunities with Emulators

8. Conclusions

Acknowledgments

Author Contributions

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI