1. Introduction
We consider the general problem of exploration using multiple mobile sensors, or fixed sensors whose field of view is reconfigurable. The problem is to
locate regions where concentration of the physical quantity of interest occurs and then, having found such regions, to expend sensor capability to
refine data there while continuing to search for new interesting regions. That is, after initial discovery, there is a trade-off between increasing knowledge by taking more measurements in regions already known to be of interest and increasing knowledge by exploring in regions where concentration of the phenomenon of interest (PoI) may exist but are not known to be present. For the purpose of brevity, we refer to regions containing a high concentration of the PoI as
interesting regions. Due to the uncertainties of exploration, the problem is not posed as one of optimal path (or resource) planning, but as a problem that balances the competing imperatives of refining measurements while exploring new territory. Therefore, the problem studied here is different than the problems raised in other research fields such as exploration and path planning in robotics and unmanned aerial vehicles (UAVs) [
1,
2,
3,
4,
5,
6,
7,
8].
There exist various approaches for sensor array configuration or sensor placement [
9,
10,
11,
12,
13,
14,
15,
16,
17,
18]. For example, Zhang provides a necessary condition for optimal sensor placement in two-dimensional space using the algebraic structure of sensors [
10]. The problem of sensor array configuration in a remote sensing formulation is addressed in [
19], in which a statistical optimal criterion is used to identify a solution. By contrast, in this paper, a more dynamic, exploratory stance is taken to trade off typically repeated (or related) measurements against measurements in new areas. Here, we focus on the Gaussian processes (GPs) modeling to formulate the sensor placement problem [
13,
14,
15,
16,
17,
18]. A Gaussian process, as a Bayesian nonparametric tool, is useful for modeling spatiotemporal data, especially in cases where the data contain random variations and so it cannot be well-represented via parametric models [
20]. GPs have been used for decades as a supervised learning tool for regression problems known as Gaussian process regression (GPR) models [
21,
22], and are also referred to as
kriging, named after the mining engineer D.G. Krige in the geostatistics literature [
23,
24,
25]. GPR models are used here as a spatiotemporal interpolator/extrapolator tool to predict the PoI at unsampled data points. GPR models and kriging methods are applicable to a wide variety of problems such as modeling or control of robotics-related applications, the prediction and estimation of temperature, precipitation, missing pixels and unmixing of pixels in hyperspectral imaging (HSI), human head pose estimation, concentration of carbon dioxide in the atmosphere, etc. [
22,
26,
27,
28,
29,
30,
31,
32,
33,
34,
35]. As an example in HSI, one main objective is to unmix the spectral information to make an inference of the composing materials in the scene. Imbiriba et al. in [
30] consider a nonlinear model where the underlying function is governed by a Gaussian process model to detect the nonlinearly mixed pixels. Xing et al. in [
33] introduce an algorithm for dictionary learning based on a GP prior to remove the noise and infer the missing data in HSI. Another example is to predict the temperature based on the available data collected from the meteorological stations. For instance, Wu and Li [
31] apply the residual kriging method to predict the average monthly temperature at 500 unknown locations in the United States.
More closely related to this research, GPs have also found been applied in sensor placement problems [
13,
14,
15,
16,
17,
18]. In [
15], Garnett et al. propose a Bayesian optimization algorithm for sensor placement based on GP modeling. They applied their approach to place 50 sensors around the UK to measure the air temperature. A typical sensor placement technique is to use the variances associated with the maximum a posteriori (MAP) estimates of GP as a measure representing the amount of uncertainty in the region. This leads to placing the sensors at locations with the highest variance (entropy) [
17,
18], in order to reduce the overall entropy in the region under study. This characterization of the quality of sensor placements seems to be naive due to the following reasons. As observed by [
14,
18], sensor placement using only the measure of variance usually forces the sensors to be placed at the borders of the region under study, due to the fact that there is no measurement outside the region and, as a result, the borders tend to have very few measurements in their neighborhood to be used in the training set of GPs. To tackle this issue, in [
16], a mutual information criterion is proposed in order to place the sensors at the locations most informative about the unseen locations. However, the optimization turns out to be an NP-hard problem. In [
14], an approximate form of the mutual information optimization approach is considered to select informative sensor locations via exploiting the sub-modularity property of mutual information. However, if we continue to perform sensor placement successively using either the entropy of the region or the mutual information criterion, there is a high chance ending up with some sort of uniform sampling (equally spaced sensor placements) in the region. This is because the placements tend to occur at the locations far away from the visited locations of the sensors. These criteria only fulfill the exploration goals without taking into account refining measurements of the interesting features about the underlying phenomenon. In addition, the available budget on sensors may not allow us to have widely scattered sensor placements (or place the sensors far away from their previous positions) in applications such as the space missions that exemplify our work.
In this paper, we devise a general framework to specify the trajectory of mobile sensors in order to find and characterize the concentration of the quantity of interest for the PoI in the region under study. This work differs from previous work in this area because we seek to fulfill both exploration and data refinement desires. Here, we assume that there are some interesting phenomena occurring in the region under study. Therefore, the measurement of mutual information or entropy is not enough to reach a single objective. The proposed framework adaptively makes value-laden decisions on the sensor placement. This framework consists of two main stages. The first stage is the prediction stage, where we apply a Gaussian process regression model to predict the PoI at the unsampled locations. The GPR not only provides interpolated/extrapolated values but also the variance of the estimated values. Then, the question is how to decide on the next set of trajectories, based on the information obtained from the previous (GPR) stage. There is not sufficient information to obtain an optimal solution a priori, since the available information is local (the result of previous measurements) and may change over time. The decision between improving information in the neighborhood of known PoI and exploring new territory in the hope of discovering additional interesting locations can be hard or contradictory. In order to deal with the above issues, we set up the decision-making based on epistemic utility theory [
36,
37,
38,
39], which forms the second stage of our framework. Epistemic utility is able to provide decisions, called
satisficing decisions, based on the local rather than global information in a way that specifically trades off between two different goals [
40].
The remainder of this paper is organized as follows. In
Section 2, some background of the GPR model is presented. Then, decision-making based on the epistemic utility controller is described. In
Section 3, these tools are applied to the problem of trajectory determination of mobile sensors.
Section 4 contains an example illustrating the effectiveness of the proposed approach.
3. Trajectory Determination of Mobile Sensors
In this section, we propose a general framework to determine the trajectory of mobile sensors in order to explore locations containing concentration of the physical quantity of interest in the region under study. Assume that an initial trajectory of the sensors has already been determined. As may happen in realistic situations, the measurements collected along these initial trajectories may not necessarily be very informative. Once some measurements are collected over the region, the goal becomes using the capability of sensors in follow-on trajectories to both refine the data and explore for interesting regions. Our proposed framework contains two main stages: the prediction stage and the decision stage. A single pass of the sensors only samples a small fraction of the entire region under study, so that decisions about whether to explore other unseen areas must be based on some predictions of the behavior of the interesting phenomenon. Such predictions will be conducive to decide on how to replace the mobile sensors or equivalently to determine the next set of trajectories of the existing sensors.
The prediction stage employs GPR modeling to estimate the PoI in unseen locations. In this setting, the collected data are treated as the training set, where the spatiotemporal location of the measurements determines the input training data and the corresponding measurements are the output training data. The input test data are the other unseen locations over the region under study and the output test data are unknown and are predicted using the GPR. Without loss of generality, we assume that input training data are collected into the set
, where
and
d is the dimension of the input data i.e.,
. For example,
for the spatiotemporal input data. The output training data, for the scalar output case, are accumulated in
, where
is the output corresponding to the input
. The spatiotemporal information of the test data is collected into the set
, where
. The unknown outputs evaluated at the input test data are defined as
. As prior knowledge, we assume that the joint density function between the training and test data is zero-mean Gaussian, meaning that on average we expect interesting phenomena to occur rarely. The GPR model was defined in Equation (
2). The kernel function for constructing the covariance matrices
in Equation (
2) will be defined later. The predictive posterior distribution over
for the noisy observation case was given in Equations (
3) and (
4).
In
Figure 1, we show an example of sampling over the region under study.
In
Figure 1, the dashed lines represent the trajectory of a satellite, the red circles denote the locations where the samples are taken, and the rectangular shape is the region under study. The function
quantifies the physical PoI. The time frame
is the
mth time the satellite visits the region. Assume that the measurements during the time frame
are taken much faster than the changes in the PoI. We also assume that the PoI exhibits some sort of smooth behavior in the region under study. Therefore, as the satellite is within a specific time frame
, we may expect to see high correlation between the nearby samples. We further assume (here, for simplicity) that the time it takes for the satellite to revisit the region is less than the time for changes to occur in the PoI. Therefore, if it happens to have the same trajectory for time frames
and
, then a high correlation between the collected data at such time frames is expected for the case where
is close to
.
Under the smoothness assumption for the PoI, and in order to account for the correlation that may exist between the nearby measurements, we define the following squared exponential covariance function
where
and
are scaling factors,
and
are the spatial information, coordinates, of any two arbitrary locations in the region under study. The terms
and
are the time frames corresponding to the temporal information related to
and
, respectively. As the available prior knowledge changes, one may define a different kernel function from Equation (
7).
Suppose that is a set of all feasible trajectories of the sensors through the region under study. We denote the locations along the trajectory at which measurements are collected by the set . For the case where the trajectory along has not been taken yet, the locations in the set belong to the set of test data i.e., .
Using the GPR model, we obtain a prediction of the PoI throughout the region and the amount of uncertainty (variance) associated with the predictions. The amount of uncertainty can be evaluated via the kernel function defined in Equation (
7). We denote
and
as the estimate quantifying the PoI and the associated measure of variance for the
pth location along the trajectory
, respectively. In this setting, we assume that the locations containing a high concentration of the physical PoI have a higher value of the quantifying function
compared to the other locations. Then, the goal is to decide on the next trajectory of the sensors based on the available data, the estimated data, and also the amount of uncertainty over the region in order to explore the interesting phenomenon. Although the GPR provides us with an estimate of the PoI over the whole region, we get large uncertainty at locations where there exist almost no measurements; the farther away the sensors are from the measurements, the higher the variance becomes. Therefore, if we only emphasize refining existing measurements, the sensors lose the inclination to explore. In contrast, if we put more emphasis on the variance, then the sensors are more encouraged to choose the trajectories which are far away from the previous trajectories to fulfill the exploration objective of the mission. In this case, even if interesting phenomena are found by the past trajectories, the sensors are reluctant to pass nearby again. One way of taking into account both data refinement and exploration desires can be achieved by constructing the following optimization problem to decide on the trajectories of the sensors:
where
k denotes the
kth trajectory from the dictionary of feasible trajectories
that pass through the region under study. The parameter
is a tuning parameter that balances between the desire to explore and refine data in regions known or predicted to be of interest. Notice that, since the set
is defined in such a way to only cover the region under study, the maximization problem for small
will at most result in the selection of a trajectory at the boundary of the region. In Equation (
8),
and
are defined as
The optimization problem Equation (
8) results in a single solution, which represents an optimum (of a
-weighted utility function). Notice that there exist no globally optimal solution for the optimization problem in Equation (
8) based on the relatively small amount of information that can be obtained from the region under study, the tension that exists between the two desiderata of exploration and data refinement in the region, and the locality of the information that we obtain. The epistemic utility framework is well-suited to handle these types of problems by making satisficing decisions such that the selected trajectories are informationally valuable decisions. Application of epistemic utility is accomplished by defining the two probability functions so that the decision-maker meets both goals of exploration and data refinement. In this setting, each of the possible trajectories of sensors is considered as a hypothesis. In contrast to the optimization problem in Equation (
8), the decision maker with Equation (
5) uses the credibility and rejectability probability functions which is a comparison between these two probability functions with the emphasizing factor
b, where
b denotes how much emphasis we consider for the informational value that we can obtain by rejecting propositions (trajectories) from the decision space. The boldness factor
b in Equation (
5) represents the agent’s willingness to reject propositions, which makes it totally different than the term
in Equation (
8). For
, the higher value of
b results in more rejection of possible propositions (decisions). For example, setting
corresponds to rejecting as many propositions as possible in the decision space.
Based on the discussion provided above, below we formulate the problem using the epistemic utility framework represented in
Section 2.2. Specifically, we consider the two following cases with their corresponding credal and rejection probability functions. In the first case, we define the informationally valuable trajectories as the trajectories that pass through the regions with high entropy (that is, passing through those regions will resolve a high uncertainty). Correspondingly, we emphasize the rejection of those trajectories which pass through regions with low entropy. Therefore, the rejectability probability is defined to emphasize rejecting the trajectories from
that contain lower uncertainty in order to favor the desire for exploration. Since the kernel function defined in Equation (
7) becomes small when the desired trajectory
is far from previous trajectories, the uncertainty associated with this trajectory, as defined in Equation (
9), will be larger. This can be seen in Equation (
2) when the covariance
would be small and
large as a result. The probability functions constructing the epistemic utility controller can be written as
where
and
are defined as
and where
and
were defined in Equation (
9). The subscript
n denotes that the functions in Equation (
11) are normalized to act like probabilities. According to Equation (
10), we assign credal probabilities to the estimates of each possible trajectory while the measure of uncertainties determine the rejection probabilities. Without loss of generality, we assume that the more interesting behavior the phenomenon at location
becomes, the higher value the underlying function
possesses.
In the second case, we relate the less interesting phenomenon (low concentration of the quantity of interest) to the rejection probability function. Particularly, the less interesting the predicted phenomenon behaves along the possible trajectory
, the higher the rejection of the corresponding hypothesis becomes. The credal probability is evaluated via the amount of uncertainty each possible trajectory may possess. The credal and rejection probabilities for case 2 are
where
and
were defined in Equation (
11). Once the probability functions are computed for either of the two above cases, we apply Levi’s rule of epistemic utility. As a result, only those trajectories that satisfy
are surviving hypotheses. The surviving options are defined by the following set:
Remark 2. In Equations (10) and (12), we used the normalized version of estimates and the associated variance along each trajectory in order to make and follow the “sum to one” property of probability. However, one can remove the normalization step and define the boldness factor of instead of . After applying the rule defined in Equation (
13), the number of surviving options may not be necessarily unique i.e., the cardinality of
could be greater than one. Each of the elements of
is a satisficing hypothesis meaning that it is both likely to be correct and possess high informational value. In order to take action, we seek to accept only one trajectory (one hypothesis). Reducing the number of elements in the set
is accomplished in the deliberation stage described in
Section 2.2 results in reducing the number of surviving hypotheses. Once the cardinality of the set
reduces to a reasonable number, the tie-breaking stage comes into play to force the set
to a unique element in order to take an action. For this purpose, one can apply the approach with Equation (
6) as described in [
37], which selects one hypothesis out of the survived hypotheses as the next trajectory. We refer to this trajectory as
. Finally, after measurement, the data obtained from
are added to the training set
U, and the whole process starts again.
Figure 2 illustrates the block diagram of the proposed framework.
In order to restrain the increase in the amount of training data fed to the GPR (to reduce complexity), in the data collection block of
Figure 2, we only retain the measurements obtained from the last
M visited trajectories of sensors. Once a new trajectory is determined and the corresponding measurements are collected, the new information is added to the training set and the oldest set of measurements are discarded. The reason is due to the assumption that the oldest set of measurements may have a very low correlation with the new measurements and the PoI may have been changed. This avoids dealing with the inverse of a big covariance matrix of the training data as we continue collecting new measurements.
4. Simulation Results
We demonstrate how the proposed framework is applied via a surrogate problem using a constellation of two satellites at the low Earth orbit (LEO). In this particular problem, we are incapable of performing random sampling over the region. Instead, we are restricted to follow specific trajectories once the orbits of the satellites (or trajectories of the mobile sensors) are determined. Initially, the satellites move in predetermined orbital planes over the region under study. For simplicity, assume that the PoI remains unchanged during the sampling period.
Figure 3 illustrates an example including the orbital planes of satellites, where the rectangular slab indicates the region under study.
In
Figure 3, the constellation is defined based on Keplerian orbital elements with semi-major axis
(km), eccentricity
, inclination of
(rad), and the right ascension of ascending node (R.A.A.N.)
(rad). Since the PoI is assumed to be unchanged during the sampling period, the fastest varying orbital elements, the true (mean) anomaly
and the argument of perigee are ignored. However, one can also take these two orbital elements into account for the case when the PoI changes over the sampling period. The region under study shown in
Figure 3 corresponds to the Earth coverage with latitude range of [72.74°, 90°], longitude range of [−37.84°, 19.27°], and for the altitude range of [129 Km, 629 Km]. The PoI that we consider here is a simulated measure of total electron content (TEC) in the ionosphere. The profile of TEC is illustrated in
Figure 4, which is borrowed from [
46].
In
Figure 4, the highest and the lowest quantity of the interesting phenomenon corresponding to the PoI, the highest and lowest quantities of the electron density, are shown with red and blue colors, respectively. We further assume that the PoI has the same profile for the altitude range under study ([129 Km, 629 Km]) as shown in
Figure 4 along the
z-axis of the rectangular region shown in
Figure 3. This image can be thought of as a discretized 2D version of the region of interest defined by the pixel values. In the simulations, it is assumed that it is possible to get direct measurements about the PoI along the current trajectories of satellites. Since the electron density content profile, as the PoI for our case study, is assumed to be unchanged during the sampling period, the kernel function in Equation (
7) simplifies into
where we set
and
is defined by the pixel location of the image shown in
Figure 4.
From the initial trajectories, shown in
Figure 3, the corresponding set of measurements of the region under study is shown in
Figure 5a. From the initial measurements, the GPR model defined in Equation (
4) and Equation (
14) is applied to measure the uncertainty and the estimate of the TEC throughout the region under study. The results are illustrated in
Figure 5b,c.
The second stage of the proposed framework is applied to decide on the next trajectories of the satellites. Although the initial orbital planes were constructed directly from the Keplerian orbital planes, below we assume (for simplicity in this example) that each possible trajectory can pass through the region under study and measure the TEC with the coverage Earth’s longitude resolution of 0.5° as shown in
Figure 4. The problem of designing the Keplerian orbital elements to generate the actual orbits corresponding to such trajectories is not the focus of this paper and is considered as a future work.
The epistemic utility is set with the agent’s index of boldness
and the credal and rejection probability functions in Equation (
12). In other words, the rejection probability function is defined such that it tends to remove the trajectories corresponding to regions with low electron density content. The credal probability function is set to encourage trajectories willing to visit regions with higher uncertainty.
Figure 6 illustrates some of the results for the measurements corresponding to the selected trajectories, measure of uncertainty, and the reconstruction profile.
According to
Figure 6, the selected trajectories are not willing to revisit the vicinity of the areas which seem to contain low electron density. Simultaneously, the decision maker does not allow for accepting a trajectory at the very vicinity of the already chosen trajectories even if high electron density has been detected at their neighborhood. This is shown by the measure of variance in the third row of
Figure 6. More specifically, very few trajectories are selected at the subregions with low electron density content, and such trajectories demonstrate some sort of uniform sampling. In contrast, more trajectories are chosen in the interesting subregions and yet these trajectories do not tend to be next to each other.
In
Figure 7, we compare the performance of the proposed framework for this example for cases 1 and 2 defined in Equation (
10) and Equation (
12), respectively. In
Figure 7a, we show the percentage of the measurements (training data) with respect to the total number of possible data if we were able to cover all the regions under study. Even after accumulating the measurements obtained from the initial and the eleven successive trajectories, we still cover around 16% of the complete data over the region. In
Figure 7b, the total variance over the region vs. the increase in the number of trajectories is illustrated. Here, the variance of visited locations are set to zero and the variance of unvisited locations are measured via the kernel function defined in Equation (
14). Then, we sum over all the variances associated with the pixels. Finally, in
Figure 7c, the peak-SNR evaluation between the true and the reconstructed TEC profile is demonstrated.
According to
Figure 7b, the increase of boldness factor for case 1 results in the decrease of the overall uncertainty in the region under study. This is because the rejection probability function corresponds to the inverse of overall variance along the possible trajectories. Therefore, the increase of the boldness factor promotes selecting the trajectories along which higher uncertainty is predicted. In contrast, case 2 shows a different behavior in which the increase of the boldness factor forces discarding the trajectories that are believed to result in collecting measurements in the subregions with low TEC. The peak signal to noise ratio (PSNR) evaluation depends not only on how to construct the probabilities in the epistemic utility but also on the true behavior of the PoI. Since the PoI has a smooth behavior and we have already taken this fact into account, the PSNR increases as we increase the boldness factor for case 1. For case 2, the reduction of the boldness factor usually provides better performance in terms of PSNR, but it also depends on where we sample. For example, setting
does not show better performance compared to
and
. The reason is that the controller decided on a trajectory that is in the region with high TEC but close to the edge of such a phenomenon. Since the kernel function in the GPR stage assumes the smooth behavior, it expands the interesting phenomenon at the neighborhood of the selected trajectory and thus the estimated PoI exceeds the edges of the true PoI. This yields to a decrease in the PSNR.
Finally, we consider two more cases to highlight the advantage of the proposed framework. In cases 3 and 4, the trajectories are selected only based on either the variance or the quantity of the interesting phenomenon, respectively. Notice that case 3 tends to select the trajectories with highest entropy (variance). Using the variance for sensor placement has been widely used [
17,
18,
47], where we have modified it such that it can be used for making decisions on the trajectories of mobile sensors rather than the pointwise sensor placement problems. In fact, case 3 and case 4 correspond to solving the optimization problem in Equation (
8) when the terms
and
are discarded, respectively.
Figure 8 and
Figure 9 illustrate the obtained results.
As expected,
Figure 8 shows that case 3 outperforms case 4 in terms of reducing the overall uncertainty in the region. The reason is that case 3 only favors the trajectories with the highest variance. In addition, due to the smoothness of the PoI, case 3 still shows higher PSNR than case 4. Comparing the total variance and the PSNR of all cases 1–4, it is clear that cases 1 and 2 result in better performance in reducing the overall variance and the increase of PSNR, collectively. The trajectory selection of cases 3 and 4 is also shown in
Figure 9 for eleven successive sets of measurements added to the initial measurements. Comparing cases 3 and 4, it is obvious that they both tend to uniformly sample the region. However, the difference is that case 3 uniformly samples the whole region, while case 4 uniformly samples the vicinity of the interesting phenomenon once some are observed. Therefore, if there was any other interesting subregion, we would not have a chance to observe it via case 4 for a low number of trajectories. In contrast, cases 1 and 2 of our proposed framework apply an adaptive sampling structure to balance between the reduction of the amount of uncertainty in the region and refining data in the vicinity of an interesting phenomenon.
Remark 3. For the purpose of only showing how the principles of the proposed framework can be applied, the simulations were carried out by neglecting the constraint on the -budget of the satellites for the orbit transfer to reach the trajectories selected by the decision-maker. Thus, in a realistic case scenario, some of the selected trajectories may not be feasible. However, this issue can be resolved by discarding non-affordable trajectories from the dictionary of trajectories in the region under study and then applying Equation (13).