^{1}

^{*}

^{2}

^{1}

This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).

In case of an environmental accident, initially available data are often insufficient for properly managing the situation. In this paper, new sensor observations are iteratively added to an initial sample by maximising the global expected value of information of the points for decision making. This is equivalent to minimizing the aggregated expected misclassification costs over the study area. The method considers measurement error and different costs for class omissions and false class commissions. Constraints imposed by a mobile sensor web are accounted for using cost distances to decide which sensor should move to the next sample location. The method is demonstrated using synthetic examples of static and dynamic phenomena. This allowed computation of the true misclassification costs and comparison with other sampling approaches. The probability of local contamination levels being above a given critical threshold were computed by indicator kriging. In the case of multiple sensors being relocated simultaneously, a genetic algorithm was used to find sets of suitable new measurement locations. Otherwise, all grid nodes were searched exhaustively, which is computationally demanding. In terms of true misclassification costs, the method outperformed random sampling and sampling based on minimisation of the kriging variance.

In case of calamities such as the major Fukushima Daiichi nuclear power plant accident in Japan [

The consequences of decisions made about local safety are costly. On the one hand, evacuating and cleaning sites are expensive tasks and these costs can be avoided whenever locations are safe. On the other hand, not cleaning or evacuating unsafe areas will put the population at risk, which typically involves even higher costs at later stages. Thus, the problem faced is twofold: (1) deciding between safe and unsafe areas and (2) deciding about when and where to sample so that that the obtained data optimally support decision making.

In contrast to [

Recent exceptions are [

While the work of Ballari

EVOI is estimated as the difference between expected costs at the present stage of knowledge and expected costs when new information becomes available [

In this work, decision making was assumed to be based on Bayes actions, _{false negative}_{false positive}

Hence, the expected cost of the upper branch was calculated by

EVOI thus corresponds with the difference between _{lower}_{upper}

For mapping, however, we are concerned with a study area and aim to find the optimal additional sample locations as the new configuration that maximises global EVOI and thus minimises _{upper}

Computation of this joint probability employed indicator kriging as well, within an iterative procedure. Writing _{1} = _{1} as shorthand for the outcome
_{1} = _{1}, ..., _{n} = _{n} and _{1,now}|_{1,now}, _{2,now}|_{1,now}|_{2,now}|_{1},

It can be easily seen that computational demands increase dramatically with the number of locations to be simultaneously optimised. For example, with two simultaneous observations, four expected cost maps and their probabilities need to be computed for each pair of measurement locations being evaluated while the solution space increases by a factor 0.5(

Indicator kriging is a pragmatic approach for mapping the probability that a random variable, say

For our purpose, we defined a single threshold at the critical level of the pollutant. Therefore the order relation problem only concerned conformance to the [0, 1] interval. Resuming the notation used earlier and noting that we do not observe the phenomenon itself but rather measure its presence with a sensor that is prone to measurement error, the indicator transform was realized as expressed by _{k}

For computation of global EVOI, the two possible states of _{k}^{n}

Often, geostatistical interpolation is not only based on observations of the target variable; in case auxiliary data are available, hybrid interpolation techniques which combine different data sources can be used to improve prediction. If the auxiliary data exhaustively cover the study area, regression kriging [

Space-time geostatistics enable data analyses and prediction by taking into account the joint spatial and temporal dependence between observations [

Pollution of the environment by, for example, deposited radionuclides or polyaromatic hydrocarbons after some calamity can be represented by a static field if autonomous changes to the system are slow in comparison to the length of the measurement campaign and subsequent management of the problem. To illustrate the EVOI approach on a static field, a synthetic data set was constructed by applying a threshold at 20, say ppm, to a stationary Gaussian random field of 100 × 100 grid cells of unit size with mean 20 (ppm) nugget 1 (ppm^{2}) representing short range variability and an isotropic spherical structural spatial correlation component with range 40 spatial units and a partial sill (semivariance) 16 (ppm^{2}). Sensor data were simulated by sampling the synthetic data (

Three scenarios were considered for adding new measurement locations to the original sample:

add a single measurement at a time at the location leading to the highest global EVOI, moving the sensor with the lowest cost (in this case Euclidean distance);

select two sensors and add measurements from two locations simultaneously by scanning the area that can be reached by each sensor within a single time step. Again the solution leading to the highest global EVOI is chosen at each time step;

add two sample locations simultaneously by scanning the complete area for the highest global EVOI and move the sensors with lowest cost distance. To speed up computations, a genetic algorithm,

The costs of misclassification were arbitrarily set at 2 and 3 cost units for false positives and false negative, respectively. As indicated above and similar to other work [

Maps generated by EVOI optimisation (scenario 1) were compared with maps interpolated using measurements obtained by: (1) random sampling and (2) sample locations determined by minimisation of the kriging variance [

A dynamic plume of some pollutant which affected a 400 × 400 m area of Wageningen University campus was simulated. The plume was composed of a deterministic part,

The deterministic part of the plume (thus excluding the stochastic deviations) was assumed to be given at the prediction stage. Accordingly, the deterministic plume was available as an auxiliary data source to support mapping presence/absence of the dynamic plume. To this end regression kriging with logistic regression on the deterministic plume was employed. Though methodologically feasible, no other explanatory variables were used in the regression model. For practical reasons, the regression coefficients were assumed to be static; they were determined just once based on the initial state of the true plume using a sample of 441 observations on a regular grid. The regression coefficients were determined by maximum likelihood. Residual variation was modelled by a spatio-temporal Gaussian random field.

For predicting the spatio-temporal residuals, the response of the logistic regression model of the corresponding moment in time was subtracted from realised previous measurements and potential current measurement outcomes. The interpolated residuals were subsequently added to the logistic response and truncated to the interval [0, 1] to compute the probabilities of presence and absence.

Previous observations of spatio-temporal residuals contain information on the next state since by construction the residual field is spatially and temporally correlated. However, unlike the static case, repeated measurement at the same location adds information to the system, since the temporal correlation is less than 1. With ^{16} = 65536 potential outcomes of the sensor measurements, which renders exhaustive search over all possibilities prohibitively expensive. In this example, search space was substantially reduced by allowing only a single measurement location per time step to be changed; the other 15 sensors remained stationary. For each of the 16 sensor locations of the previous time step we tested alternative locations and the configuration having the lowest accumulated expected misclassification costs was selected. Initial sensor locations were again chosen on a regular grid.

The true misclassification costs accumulated over time and space achieved by EVOI sampling were compared with those obtained by:

each time step randomly selecting one sensor and measuring at a single randomly selected vacant location. The other 15 sensors stay and measure at their previous location (Random1);

random relocation of all sensors at each time step (Random16);

repeated measurement at the initial regularly spaced sample locations (Fixed).

The random sampling methods (1 and 2) were repeated 1,000 times.

Euclidean distance was used for deciding which sensor to move to the next location, but another cost criterion could have been used with only minor modification of the algorithm, as was demonstrated in [

While the results obviously depend on the choice of the start locations, the figure exemplifies that one has to be careful in focusing on sensor constraints for planning adaptive sampling. In doing so, highly informative sites may never be visited because they are hidden behind data from earlier measurements. In contrast, a global search will identify those relevant sites and, next, sensor constraints may be used to find a strategy for reaching their locations.

Differences between real costs (usually not known) and expected costs (see the right-hand side of

No attempts were made to optimise the computer code, which resulted in a run time of several days to calculate relocation of a single sensor over the six time steps. We are well aware that this is far from realistic and therefore suggest the following improvements:

re-utilisation of the kriging weights during the analysis of the possible measurement outcomes of a sensor configuration. This is feasible because kriging weights are independent of the measured values, they only depend on the spatio-temporal configuration of data points;

using a search heuristic rather than exhaustive search, see also Section 3.1;

using dedicated compiled software rather than a script running in R.

Particularly suggestion (2) requires further research which should include finding appropriate parameter settings for the optimiser. A suitable search algorithm should also allow for concurrent site selection for multiple sensors. As explained in Section 3.2, this greatly affects the complexity of the problem.

EVOI sampling led to lower accumulated real misclassification costs (1.360e+5) than any of the alternative considered, as can be observed in

The expected value of information (EVOI) approach allocates new observations at locations that intuitively make sense. Moreover, comparison with random sampling and sampling aiming for minimum kriging variance showed that the expected misclassification costs were significantly reduced with EVOI-sampling. The method accounts for data values and specified misclassification costs. The latter can be dissimilar for different kinds of errors (

Constraining potential sample locations to the space that can be travelled by a small set of mobile sensors is a flawed strategy since the sensors may get trapped in some area and may thus fail to visit highly informative spots that are screened by previous observations. A better approach would be to first perform a global search for the highest EVOI and next use sensor constraints for deciding which sensors to move to the selected measurement sites.

With the help of indicator kriging, computation of EVOI for a given set of sample locations is computationally inexpensive and methodologically simple. Finding the optimal sensor locations, however, remains a very demanding task. This particularly holds when selecting multiple locations simultaneously such as in case of monitoring a dynamic spatial field. Meta-heuristic optimisers including genetic algorithms and simulated annealing may be useful for these situations, but this requires research beyond the scope of the current paper.

In this work, parameter uncertainty and uncertainty about the geostatistical model were not taken into account. However, divergence between the true and the expected misclassification costs after adding several measurements indicates that the approach is sensitive to model misspecifications. This may be consequential, for example if EVOI is used for deciding whether or not to stop a survey. Model parameterisation and dealing with uncertainty in the geostatistical model are therefore other aspects requiring further research.

Decision tree showing decisions to place a sensor (sensor) or not (

Static field. (

Configuration of initially regularly spaced sensors after two iterations with a single observation per step (scenario 1). First sensor 2 moved (white arrow) and a measurement was made, next sensor 5 moved (black arrow) but the measurement has not yet been made.

Effect of the way sensor constraints are taken into account on aggregated misclassification costs with two simultaneously moving sensors (scenarios 2 and 3).

Probability of presence after 15 time steps in which two simultaneous sensor measurements were added (scenario 2). Each sensor only scanned a limited neighbourhood around their current position (indicated by the dashed circle) for the optimal solution, which caused them to get trapped. The concentric black-white dots (eyes) indicate the newly selected locations.

EVOI sampling of a dynamic plume (ppm), with single sensor relocation per time step. The black line delineates the critical level for the deterministic plume; black/white dots are the sensor locations.

Comparison of the accumulated real misclassification costs of four sampling approaches. Distributions for Random1 and Random16 were obtained from 1,000 realisations.

Semivariance structure of the Gaussian random field added to the deterministic plume.

^{a} |
^{2}) |
|||
---|---|---|---|---|

γ_{s} |
Sph | 200 | 320 m | - |

γ_{t} |
Sph | 50 | 35 min | - |

γ_{st} |
Sph | 50 | 150 m | 7.14 |

“Sph” denotes a spherical shape.

Semivariance structure used for predicting residuals for the dynamic plume.

^{a} |
^{b} |
|||
---|---|---|---|---|

γ_{s} |
Exp | 0.085 | 33 m | - |

γ_{t} |
Exp | 0.025 | 12 min | - |

γ_{st} |
Exp | 0.025 | 50 m | 7.14 |

“Exp” denotes an exponential shape;

Range parameter: practical range is approx. 3 × the listed value.

EVOI improvements with respect to random sampling and minimisation of kriging variance.

Random | 7.1 | 12.0 | 2.4e−08 |

Min. kriging var. | 3.0 | 9.9 | 1.6e−03 |