*2.2. Estimation Algorithm*

To implement an algorithm that allows applying the mathematical model exposed in the previous section, we propose using a functional additive models to estimate the means, variances and covariances in (1). Given a sample of size *n*, {**<sup>X</sup>***i*,(*Yi*1,*Yi*2)}*ni*=1, where **X***i* = *X*1*i* (*t*),..., *Xpi* (*t*), the steps of the proposed estimation algorithm are the following:

**Step 1:** Perform a decomposition of each covariate *Xj*(*t*) in basis functions of the form *Xj*(*t*) ≈ ∑*Kk*=<sup>1</sup> *<sup>ξ</sup>jk<sup>φ</sup>k*(*t*), where *φk* (*k* = 1, ... , *K*) are *K* basis functions (i.e., B-splines, wavelets), and *ξil* are either the coefficients of an expansion in fixed basis or the principal component scores of the Karhunen-Loève expansion [44,45]. As a result, we obtain the transformed covariates

$$\tilde{\mathbf{X}}\_{i} = \left( \left( \mathbb{J}\_{i1'}^{1}, \dots, \mathbb{J}\_{iK}^{1} \right); \left( \mathbb{J}\_{i1'}^{2}, \dots, \mathbb{J}\_{iK}^{2} \right); \dots; \left( \mathbb{J}\_{i1'}^{p}, \dots, \mathbb{J}\_{iK}^{p} \right) \right) \quad i = 1, \dots, m$$

**Step 2:** For *r* = 1, 2, fit an additive model to the sample {**X˜** *<sup>i</sup>*,*Yi*1,*Yi*2}*ni*=<sup>1</sup> and obtain an estimation of the means

$$\hat{\mu}\_r(\mathbf{X}\_i) = \alpha\_r + \sum\_{j=1}^p \sum\_{k=1}^K \hat{f}\_{rk}^j(\xi\_{ik}^j) \tag{4}$$

and then estimate *σ*2*r* (**X**) from the sample {**X˜** *<sup>i</sup>*,(*Yir* − *<sup>μ</sup>*<sup>ˆ</sup>*r*(**<sup>X</sup>***i*))<sup>2</sup>}*ni*=<sup>1</sup> as

$$\hat{\sigma}\_r^2(\mathbf{X}\_i) = \exp\left(\hat{\boldsymbol{\beta}}\_I + \sum\_{j=1}^p \sum\_{k=1}^K \hat{\mathbf{g}}\_{rk}^j(\boldsymbol{\xi}\_{ik}^j)\right) \tag{5}$$

Then, compute the correlation *ρ*(**X**), which is related to the covariance by *<sup>σ</sup>*12(**X**) = *<sup>σ</sup>*1(**X**)*<sup>σ</sup>*2(**X**)*ρ*(**X**), using the sample {**<sup>X</sup>***i*, ˆ*<sup>δ</sup>i*}*ni*=1, as follows:

$$\rho(\mathbf{X}\_i) = \tanh\left(\hat{\gamma} + \sum\_{j=1}^p \sum\_{k=1}^K \hat{m}\_k^j(\boldsymbol{\xi}\_{ik}^j)\right)$$

being

$$\delta\_{i} = \frac{\left(\mathcal{Y}\_{i}^{1} - \hat{\mu}\_{1}(\mathbf{X}\_{i})\right)\left(\mathcal{Y}\_{i}^{2} - \hat{\mu}\_{2}(\mathbf{X}\_{i})\right)}{\hat{\sigma}\_{1}(\mathbf{X}\_{i})\hat{\sigma}\_{2}(\mathbf{X}\_{i})}$$

where *f jrk*, *gjrk* and *mjk* are smooth and unknown functions, *αr*, *βr* and *γ* are coefficients, *p* the number of predictors (covariates), and *K* the number of basis. Please note that the link functions *<sup>H</sup>σ*(·) = exp(·) and *<sup>H</sup>ρ*(·) = tanh(·) used in the variance and correlation structures, respectively, ensure that the restrictions on the parameter spaces (*σ*2*r* (**X**) ≥ 0 and 0 ≤ *ρ*(**X**) ≤ 1) are maintained. Moreover, in order to guarantee the identification of the model, we assume that all the means of functions *fj*, *gj* and *mj* are zero.

**Step 3:** Compute the standardized residuals

$$\left(\begin{array}{c} \varepsilon\_{l1} \\ \varepsilon\_{l2} \end{array}\right) = \hat{\Sigma}^{-1/2}(\mathbf{X}\_{l}) \left(\begin{array}{c} \mathbf{Y}\_{l1} - \boldsymbol{\mu}\_{1}(\mathbf{X}\_{l}) \\ \mathbf{Y}\_{l2} - \boldsymbol{\mu}\_{2}(\mathbf{X}\_{l}) \end{array}\right) \quad \mathbf{i} = \mathbf{1}, \ldots, \mathbf{n} \tag{6}$$

where

$$
\Sigma(\mathbf{X}\_{l}) = \begin{pmatrix}
\vartheta\_{1}^{2}(\mathbf{X}\_{l}) & \vartheta\_{12}(\mathbf{X}\_{l}) \\
\vartheta\_{12}(\mathbf{X}\_{l}) & \vartheta\_{2}^{2}(\mathbf{X}\_{l})
\end{pmatrix} \tag{7}
$$

with *<sup>σ</sup>*ˆ12(**<sup>X</sup>***i*) = *<sup>σ</sup>*ˆ1(**<sup>X</sup>***i*)*σ*ˆ2(**<sup>X</sup>***i*)*ρ*<sup>ˆ</sup>(**<sup>X</sup>***i*).

> **Step 4:** Obtain the kernel estimation of the bivariate density ˆ *f*(*<sup>ε</sup>*1,*ε*2) given by

$$f((\varepsilon\_1, \varepsilon\_2), \mathbf{H}) = \frac{1}{n} \sum\_{i=1}^{n} \mathbf{K\_H} \left( \begin{array}{c} \varepsilon\_1 - \varepsilon\_{i1} \\ \varepsilon\_2 - \varepsilon\_{i2} \end{array} \right) \tag{8}$$

where *<sup>K</sup>*(·) is the kernel which is a symmetric probability density function and **H** is a 2 × 2 positive definite matrix. Then, obtain the *τth* unconditional bivariate uncertainty region on the residual scale as

$$\mathfrak{E}\_{\tau} = \left\{ (\varepsilon\_1, \varepsilon\_2) \right\} \in \mathbb{R}^2\\|\hat{f}(\varepsilon\_1, \varepsilon\_2)\rangle \ge \hat{k}\}\tag{9}$$

ˆ *k* being the empirical *τ* quantile of the values ˆ *f*(*<sup>ε</sup>*11,*ε*12), ... , ˆ*f*(*<sup>ε</sup>n*1,*εn*<sup>2</sup>). Finally, for a given **X**, the conditional bivariate uncertainty region *<sup>R</sup>τ*(**X**) is estimated according to (3)

$$
\hat{\mathcal{R}}\_{\mathsf{T}}(\mathsf{X}) = \begin{pmatrix} \beta\_1(\mathsf{X}) \\ \beta\_2(\mathsf{X}) \end{pmatrix} + \hat{\mathsf{L}}^{1/2}(\mathsf{X})\mathbb{1}\_{\mathsf{T}} \tag{10}
$$

### **3. Case Study: Joint Forecasting of** (**SO**2, **NO***x*) **Pollution Episodes**

The mathematical model exposed in the previous section was applied to the forecasting of pollution episodes registered at a coal-fired power station located in the northwest of Spain. SO2 and NO*x* are two of the main air pollutants generated by combustion processes, and both have harmful effects on human health. Moreover, it was proven that both pollutants are correlated [46], which is consistent with the model in (1). Fortunately, pollution episodes are not very frequent and the trend is that they will become scarcer as technology advances.

Let *t*0 be the present time measured each five minutes, and SO(*<sup>t</sup>*0) and NO(*<sup>t</sup>*0) the concentrations obtained respectively by the series of bi-hourly SO2 and NO*x* means at instant *t*0. Being *th* the prediction horizon time, the interest is to predict

$$(\mathbf{Y}\_1, \mathbf{Y}\_2) = (\mathbf{SO}(\mathbf{t}\_0 + \mathbf{t}\_{\mathrm{h}}), \mathbf{NO}(\mathbf{t}\_0 + \mathbf{t}\_{\mathrm{h}})) $$

and provide an uncertainty region for these estimations given a specific value of *τ*, using the predictive covariates

$$\mathbf{X} = \left( X^1(t), X^2(t), X^3(t), X^4(t) \right) = \left( \mathbf{SO}(t), \mathbf{NO}(t), \mathbf{SO}'(t), \mathbf{NO}'(t) \right) \quad \text{with} \quad t \in \left[ t\_0 - t\_{\text{lag}}, t\_0 \right]$$

where (NO(*t*), SO(*t*)) represents the first derivatives of the functions that approximate the concentrations of both pollutants. These derivatives are obtained from the functional representation of the discrete data, according to Step 1 of the estimation algorithm. Please note that *tlag* represents the lagged time used in the predictors. In particular" we are interested in predicting an hour in advance, according to the requirements of current Spanish legislation and, therefore, we will consider *th* = 12 (60 min) from now on.

Most of the time, these concentrations times series are low, close to zero, and in order to obtain a reasonably large number of pollution incidents, we took as our sample a historical matrix {(**<sup>X</sup>***i*,**Y***i*)}*Ni*=<sup>1</sup> with pollution data of approximately 12 years, which includes a considerable number of pollution episodes (see [9] for a detailed description of the historical matrix construction). In summary, in the historical matrix not all the data are used, but only part of them, following a quantile-weighted criterion. This means that the larger the concentration, the greater the number of observations of that concentration in the sample. Figure 1 shows a sample of the historical matrix, on top are the curves of both pollutants (NO(*t*), SO(*t*)) measured in *tlag* = 20 discretization points and evaluated in 5 B-spline basis functions of order *p* = 4. On the bottom, the first derivative of the B-spline curves (NO(*t*), SO(*t*)) with order *p* − 1 are represented.

**Figure 1.** Five curves of the historical matrix for pollutants (NO(*t*), SO(*t*)) and their corresponding first derivatives (NO(*t*), SO(*t*)), observed in a period of time *tlag* = 20.

In this paper, we tested four models using as predictors different combinations of the covariates -*<sup>X</sup>*<sup>1</sup>(*t*), *<sup>X</sup>*<sup>2</sup>(*t*), *<sup>X</sup>*<sup>3</sup>(*t*), *X*<sup>4</sup>(*t*) that include the concentrations of both pollutants and their first derivatives. In particular, we will consider models given by:

$$
\begin{pmatrix} SO(t\_0 + 12) \\ NO(t\_0 + 12) \end{pmatrix} = \begin{pmatrix} \mu\_1(\mathbf{X}^1) \\ \mu\_2(\mathbf{X}^2) \end{pmatrix} + \begin{pmatrix} \sigma\_1^2(\mathbf{X}^1) & \sigma\_{12}(\mathbf{X}^1, \mathbf{X}^2) \\ \sigma\_{12}(\mathbf{X}^1, \mathbf{X}^2) & \sigma\_2^2(\mathbf{X}^2) \end{pmatrix} \begin{pmatrix} \varepsilon\_1 \\ \varepsilon\_2 \end{pmatrix} \tag{11}
$$

The four considered models, M1, M2, M3 and M4, are configured in Table 1 where the cross X indicates the covariates included in each model.

**Table 1.** Selected models from equation in (11) . Cross X indicates the covariates included in each of the four considered models. The derivatives of the functions are indicated with a single quote.


To validate and compare the four proposed models, we randomly select from the full historical matrix a training set **M***<sup>I</sup>* = (**X***Ii* , **Y***Ii*)*ntrain i*=1 and a test set **M***<sup>I</sup> I* = (**X***<sup>I</sup> I i* , **Y***<sup>I</sup> I i* )*Ni*=*ntrain*+1.

The estimates *μ*ˆ1, *μ*ˆ2, **Σˆ** were obtained from the samples in the first matrix *MIt* . The bivariate uncertainty regions for the values of the covariates on the second matrix *M<sup>I</sup> I* were obtained using (3). Theestimatedcoverage*τ*ˆisgivenby

$$\mathfrak{F} = \frac{1}{n\_{test}} \sum\_{i=n\_{train}+1}^{N} I\{\mathbf{Y}\_i^{II} \in \mathcal{R}\_\tau(\mathbf{X}\_i^{II})\}; n\_{test} = N - n\_{train} \tag{12}$$

The performance of the proposed predictors was evaluated in two pollution incidents. A bivariate representation of these episodes is shown in Figure 2. The orientation of the points shows a clear correlation between both pollutants although the range of concentrations is quite different.

**Figure 2.** Observed and forecasted concentrations of SO2 and NO*x* for two pollution episodes.

The nominal and the estimated coverages for different time lags and training sample sizes are shown in Table 2. The coverages correspond to the bivariate solution, and were obtained for *ntest* consecutive observations that might or might not correspond to pollution incidents.

**Table 2.** Nominal *τ* and estimated *τ*ˆ coverages for each of the four models under study. Two time lags, *tlag* = 10, *tlag* = 20, two sizes of the training sample *ntrain* = 10, 000 and *ntrain* = 4900, and two numbers of principal components, K = 3 and K = 5, were considered. Results correspond to the test sample.


RMSE values for each model are shown in Table 3 , considering an expansion of the functions in three or five principal components, and lags *tlag* = 10 and *tlag* = 20. Please note that this table makes reference to the marginal distributions, and that the range of concentrations for each pollutant is very different, therefore the RMSE values are also different.


**Table 3.** RMSE values for two pollution episodes and the four models tested, considering curves with two different time lags, size of the training samples and number of principal components.

Note: *ntrain* = 100 · *<sup>n</sup>*•*train*.

For the two episodes analyzed, Figure 3 shows the observed and the predicted values as well as the quantile for *τ*ˆ = 0.95, calculated for the test sample. The results correspond to curves observed in ten points (*tlag* = 10) and represented in a basis expansion in three functional principal components. These univariate confidence intervals were respectively constructed from (11) as *<sup>μ</sup>*1(**X**<sup>1</sup>) + *<sup>σ</sup>*1(**X**<sup>1</sup>)*ε*0.95 1 and *<sup>μ</sup>*2(**X**<sup>2</sup>) + *<sup>σ</sup>*2(**X**<sup>2</sup>)*ε*0.95 2 , *ε*0.95 1 and *ε*0.95 1 being the 0.95 quantile of the distributions of errors *ε*1 and *ε*2, respectively.

**Figure 3.** Observations (solid black line), mean (solid gray line) and 0.95*th* quantile estimations (discontinuous line) for both pollutants, SO2 and NO*x* and for two pollution episodes.

Table 4 shows the maximum consumed memory and the runtime (in seconds) for the four models tested and two different dimensions of the submatrices are executed in a Intel Core i7-2600K with 16 GB of RAM.


**Table 4.** Maximum memory consumption (MB) and computation time (seconds) for the four models tested, *Mi*, following different strategies concerning the time lag, *tlag*, the size of the training sample, *ntrain*, and the number of basis functions, K.

Note: *ntrain* = 100 · *<sup>n</sup>*•*train*.
