Next Article in Journal
Innovations in Detection of Deliberate or Accidental Contamination with Biological Agents in Environment and Foods
Previous Article in Journal
Technical Problem Identification for the Failures of the Liberty Ships
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Linear Bayesian Updating Model for Probabilistic Spatial Classification

Department of Statistics, Central South University, Changsha 410012, Hunan, China
*
Author to whom correspondence should be addressed.
Challenges 2016, 7(2), 21; https://doi.org/10.3390/challe7020021
Submission received: 8 September 2016 / Revised: 28 October 2016 / Accepted: 21 November 2016 / Published: 29 November 2016

Abstract

:
Categorical variables are common in spatial data analysis. Traditional analytical methods for deriving probabilities of class occurrence, such as kriging-family algorithms, have been hindered by the discrete characteristics of categorical fields. To solve the challenge, this study introduces the theoretical backgrounds of the linear Bayesian updating (LBU) model for spatial classification through an expert system. The main purpose of this paper is to present the solid theoretical foundations of the LBU approach. Since the LBU idea is originated from aggregating expert opinions and is not restricted to conditional independent assumption (CIA), it may prove to be reasonably adequate for analyzing complex geospatial data sets, such as remote sensing images or area-class maps.

1. Introduction

Categorical spatial data, such as lithofacies, land-use/land-cover classifications, and mineralization phases, are widely investigated geographical and geological information sources. They are typically represented by mutually exclusive and collectively exhaustive classes and visualized as area-class maps [1]. In the Geo-information context, Rao’s quadratic diversity was used in [2] to measure the scale-dependent landscape structure. In the geological counterpart, a spatial hidden Markov chain model was employed in [3] for estimation of petroleum reservoir categorical variables. As a geostatistical model, the Markov chain random field (MCRF) theory [4] and Markov chain sequential simulation (MCSS) algorithm [5] are common choices for the prediction of categorical spatial data. They have been widely used in spatial-related fields and gratifying results have been achieved. However, the MCRF approach is based on a conditional independent assumption (CIA), which may be inappropriate due to complex data interaction in a spatial context [6]. The Tau model [7] and Nu expression [8] introduce additional weights to relax the assumption of conditional independence. It is obvious that these power or multiplication relationships between multi-point pre-posterior probabilities and two-point conditional probabilities involve some subjective guesswork and may not be suitable in real-world spatial analysis. As for the generalized linear mixed model (GLMM) [6], where intermediate, latent, spatially correlated, normal variables are assumed for the observable non-normal responses to account for spatial dependence information. The random effects are always assumed to follow a normal distribution in the GLMM. Our concern here is whether the latent variables for different categories can be assumed to be independent of each other at the same location.
Generally speaking, the spatial classification problem can be regarded as combing two-point transition probabilities into a multi-point conditional probability. A formal introduction of most of the available approaches to aggregate probability distributions in geosciences can be found in [9]. Our task is to use the probability pooling method for spatial classification based on the pioneering work of [10,11]. We profit from the predecessor’s studies and interpret transition probabilities as expert opinions. The transition probabilities are obtained by the transiogram [12] spatial measure. The remainder of this paper is organized as follows. We begin by introducing the basic forms of the linear Bayesian updating (LBU) method in Section 2. We then make detailed proofs in Section 3 for some propositions introduced by [10,11] which have not yet been proven. A real-world case study is given in Section 4. Finally, conclusions and future challenges are discussed in Section 5.

2. Linear Bayesian Updating

Consider the spatial locations x 0 , x 1 , , x n in the remote sensing images or area-class maps. We use A and D 1 , , D n to represent the events in sample spaces of categorical random variable C ( x 0 ) and C ( x 1 ) , , C ( x n ) respectively and A ¯ denotes the complementary event of A . In the case of categorical data, let A be the finite set of events in the sample space Ω such that the events A 1 , A 2 , A K of A are mutually exclusive and collectively exhaustive. Obviously,
A j = { 1 if C ( x 0 ) = j , j = 1 , 2 , , K 0 otherwise .
In the subsequent discussions, A will be used as a general notation for A j . We treat the n neighboring events D 1 , D 2 , , D n as experts, and consider the conditional probabilities P ( A | D i ) as expert opinions Q i for the occurrence of A , an event of interest. The experts’ opinions are regarded as random variables Q i whose values q i , 1 i n , are to be revealed to the decision maker (DM). The posterior probability of A given Q = q is then p ( q ) .
The original LBU method was firstly proposed by [10] in statistical science in the form
p * ( q ) = p + i = 1 n λ i ( q i μ i )
with possibly negative weights, λ i , expressing the amounts of correlation between each Q i and A . μ i denotes the mathematical expectation of Q i . When μ 1 = = μ n = p and λ i 0 , Equation (2) yields to the linear opinion pool [13]
p * ( q ) = λ 0 p + i = 1 n λ i q i subject to i = 0 n λ i = 1
where the DM is considered as one of the experts. Our LBU method follows closely to that of [10], which is proved to be the only formula satisfying p ( q ) d F ( q ) = p for all distribution d F with mean vector μ . The highlight of our LBU model lies in the fact that the random variable Q i has been replaced by transition probability, a measure for spatial continuity.

3. Theoretical Foundations of Linear Bayesian Updating

3.1. Parameter Ranges for Linear Bayesian Updating

Although the LBU model has been used in [10,11], it is our conviction that many theoretical challenges need to be solved to better develop this method for further use. A legitimate posterior probability can be obtained only when λ i obeys a number of inequalities [10]. Since p ( q ) is a probability, it must satisfy 0 p ( q ) 1 , that is to say,
{ p + i = 1 n λ i ( q i μ i ) 0 p + i = 1 n λ i ( q i μ i ) 1 .
Through algebra transformation, (4) can be simplified as
{ 1 + i = 1 n λ i q i p i = 1 n λ i μ i p i = 1 n λ i q i i = 1 n λ i μ i 1 p 1 .
Suppose that all λ i are positive, since 0 q i 1 , as long as
{ i = 1 n λ i μ i p 1 i = 1 n λ i ( 1 μ i ) 1 p 1 ,
(5) can be satisfied. Therefore, if the DM considers that all λ i are positive, the most common case, then they must be chosen so that
max { i = 1 n λ i μ i / p , i = 1 n λ i ( 1 μ i ) / ( 1 p ) } 1 ,
which can be regarded as a sufficient but not necessary condition of the LBU method. Only when (6), or equivalently (7), is satisfied, can Equation (2) be a valid probabilistic model.

3.2. Interpreting Parameters as Regression Coefficients for Linear Bayesian Updating

The LBU model given above has some parameters, which need to be learned or estimated by the DM. Let Σ Q denote the covariance matrix of Q , and σ p Q be the vector of covariances between p and Q . Let t denote matrix transposition and
Q = ( Q 1 , Q 2 , , Q n ) μ = ( μ 1 , μ 2 , , μ n ) λ = ( λ 1 , λ 2 , , λ n ) t ,
using the definition of expectation, it is
E ( A | Q ) = p * , E ( E ( A | Q ) ) = E ( A ) = E ( p * ) = p .
In addition, Equation (2) can be given as
p = p + ( Q μ ) λ  and  μ = E ( Q )
i.e.,
p = p + ( Q 1 μ 1 , Q 2 μ 2 , , Q n μ n ) ( λ 1 , λ 2 , , λ n ) t .
Taking the expectations on both sides of the equation after transformation yields
E [ ( Q μ ) t ( p p ) ] = E [ ( Q 1 μ 1 , Q 2 μ 2 , , Q n μ n ) t ( Q 1 μ 1 , Q 2 μ 2 , , Q n μ n ) ] ( λ 1 , λ 2 , , λ n ) t = cov ( Q 1 , Q 2 , , Q n ) ( λ 1 , λ 2 , , λ n ) t = Σ Q λ
we have
λ = Σ Q 1 σ p q
provided that the covariance matrix Σ Q is invertible.
Suppose we have m samples in the training set, consider the regression model
Y = X B + ε
where
Y = ( p 1 p p 2 p p m p ) X = ( 1 q 11 μ 1 q 1 n μ n 1 q 21 μ 1 q 2 n μ n 1 q m 1 μ 1 q m n μ n ) B = ( b 0 b 1 b n ) ε = ( ε 1 ε 2 ε m ) ,
ε = ( ε 1 , ε 1 , , ε m ) t denotes the random errors. Provided that the experts do not have a linearly dependent relationship, the least squares estimation of the regression coefficients yields
B ^ = ( X t X ) 1 ( X t Y ) ,
which can be written in its matrix form
B ^ = [ ( 1 1 1 q 11 μ 1 q 21 μ 1 q m 1 μ 1 q 12 μ 2 q 22 μ 2 q m 2 μ 2 q 1 n μ n q 2 n μ n q m n μ n ) ( 1 q 11 μ 1 q 1 n μ n 1 q 21 μ 1 q 2 n μ n 1 q m 1 μ 1 q m n μ n ) ] 1 ( 1 1 q 11 μ 1 q m 1 μ 1 q 12 μ 2 q m 2 μ 2 q 1 n μ n q m n μ n ) ( p 1 p p 2 p p m p ) = ( m m ( Q 1 ¯ μ 1 ) m ( Q n ¯ μ n ) m ( Q 1 ¯ μ 1 ) i = 1 m ( q i 1 μ 1 ) 2 i = 1 m ( q i 1 μ 1 ) ( q i n μ n ) m ( Q 2 ¯ μ 2 ) i = 1 m ( q i 2 μ 2 ) ( q i 1 μ 1 ) i = 1 m ( q i 2 μ 2 ) ( q i n μ n ) m ( Q n ¯ μ n ) i = 1 m ( q i n μ n ) ( q i 1 μ 1 ) i = 1 m ( q i n μ n ) 2 ) 1 ( m p ¯ m p i = 1 m ( p i * p ) ( q i 1 μ 1 ) i = 1 m ( p i * p ) ( q i 2 μ 2 ) i = 1 m ( p i * p ) ( q i n μ n ) ) .
When the sample size is large enough, the sample mean is approximately equal to the total expectation, we have
B ^ = 1 m ( 1 E ( Q 1 ) μ 1 E ( Q 2 ) μ 2 E ( Q n ) μ n E ( Q 1 ) μ 1 E [ ( Q 1 μ 1 ) 2 ] E [ ( Q 1 μ 1 ) ( Q n μ n ) ] E ( Q 2 ) μ 2 E [ ( Q 2 μ 2 ) ( Q 1 μ 1 ) ] E [ ( Q 2 μ 2 ) ( Q n μ n ) ]     E ( Q n ) μ n E [ ( Q n μ n ) ( Q 1 μ 1 ) ] E [ ( Q n μ n ) 2 ] ) 1 m ( E ( p ) p E [ ( p p ) ( Q 1 μ 1 ) ] E [ ( p p ) ( Q 2 μ 2 ) ] E [ ( p p ) ( Q n μ n ) ] ) = 1 m ( 1 E ( Q 1 ) μ 1 E ( Q 2 ) μ 2 E ( Q n ) μ n E ( Q 1 ) μ 1 E ( Q 2 ) μ 2 cov ( Q 1 , Q 2 , Q n ) E ( Q n ) μ n ) 1 m ( E ( p ) p σ p Q ) = ( 0 Σ Q 1 σ p Q ) ( n + 1 ) × 1 .
Therefore, our derivation gives an explanation of the parameters λ i in the LBU model as the linear regression coefficients of p p with respect to Q μ when the neighboring events are not linearly dependent. As in multiple regression, each λ i can thus be thought of as a measure of the additional information that the i th expert provides over and above the other experts and what the DM already knows.

3.3. Invertible Conditions of Linear Bayesian Updating

We now discuss what happened when the linear systems of Equation (2) become invertible. Equation (2) can be rewritten as
p * ( q ) = p i = 1 n λ i μ i + ( q 1 , q 2 , , q n ) ( λ 1 , λ 2 , , λ n ) t .
Therefore,
( q 1 , q 2 , , q n ) ( λ 1 , λ 2 , , λ n ) t = p ( q ) p + ( μ 1 , μ 2 , μ n ) ( λ 1 , λ 2 , , λ n ) t ,
we have
( q 1 , q 2 , , q n ) Λ = [ p ( q ) p ] ( λ 1 , λ 2 , , λ n ) + ( μ 1 , μ 2 , μ n ) Λ ,
where
Λ = ( λ 1 , λ 2 , , λ n ) t ( λ 1 , λ 2 , , λ n ) = ( λ 1 2 λ 1 λ 2 λ 1 λ n λ 2 λ 1 λ 2 2 λ 2 λ n λ n λ 1 λ n λ 2 λ n 2 ) .
Since the determinant | Λ | = 0 , Λ is irreversible. Therefore, q 1 , q 2 , , q n cannot be uniquely determined. Thus, the linear systems of Equation (2) are not invertible under these circumstances.
The case where the linear system of Equation (2) is invertible happens only when there is one expert to be consulted, i.e.,
p ( q ) = p + λ ( q μ ) .
In this case, the left inverse of the system is
q = p p λ + μ ,
where p is the input. The right inverse can be given as
q i = p ( q i ) p λ i + μ i ,
where p ( q i ) is the desired output after consultation. The necessary q i can be compared with the corresponding transition probability P ( A | D i ) . If large deviation emerges, the expert Q i may seem not to be convincing.

4. Case Study

We now present a case study to demonstrate the use of the method. The Swiss Jura data set [14] is used, where four lithology types are sampled in a 14.5 km2 region. These rock types are Argovian, Kimmeridgian, Sequanian and Quaternary; corresponding class proportions of these four categories are 20.46%, 32.82%, 24.32%, 22.39% respectively. We have 259 samples in total for prediction (Figure 1).
The first task we need to do is to obtain the expert opinions (i.e., transition probabilities) in spatial scenarios. We use the 10 nearest samples for prediction, thus we always get 10 experts for consultation. The detailed procedures for estimating the transition probability are beyond the scope of this work. One can find the discussions with respect to transiogram fitting in [1,11]. We only show the descriptive statistics of transition probabilities in Table 1.
After obtaining the expert opinions, we can use the regression model represented by Equation (2) to estimate the linear weights in spatial classification. Given that multiple neighbors will be involved in spatial scenarios most of the time, the LBU should often be a multivariable linear regression model. With the estimated regression coefficients, we can use the maximum a posteriori (MAP) probability criterion for classification [11].
The final prediction results have been shown in Figure 2. We get an overall classification accuracy of 82.63% (214 out of259). To better reflect the prediction accuracy, the precision indicator for each lithoface has also been illustrated in Figure 3.

5. Conclusions

In this work, we consummate the theoretical foundations of the LBU model for the prediction of categorical spatial data. We have enriched our previous findings [11] by adding some rigorous theoretical proofs of the LBU method. To show how the LBU model can work in spatial settings, a real-world case study has also been carried out. As pointed out by [11], our method can also be generalized to nonlinear systems, where more confident probability forecasting results can be obtained.
In the proposed model, the choice of the size of a neighborhood can be regarded as a variable selection problem. The involvement of more neighboring samples is likely to boost the prediction accuracy for the training set, while it may be computation-intensive and accompanied by a higher generalization error, the so-called overfitting. Challenges to determine the optimal number of neighbors will be the focus of our future works and may be addressed in our upcoming papers.

Acknowledgments

This work is supported by the National Key Research and Development Programs of China (No. 2016YFB0502601; No. 2016YFB0502303) and the Fundamental Research Funds for the Central Universities of Central South University (No. 2016zzts011). Special thanks to the editors and two anonymous reviewers for their constructive comments and suggestions. The authors are also indebted to Ying Chen, Danhua Chen, Ruizhi Zhang and Wuyue Shen for their critical reviews of the paper.

Author Contributions

Xiang Huang and Zhizhong Wang conceived and designed the proofs; Xiang Huang wrote the paper.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Cao, G.; Kyriakidis, P.C.; Goodchild, M.F. Combining spatial transition probabilities for stochastic simulation of categorical fields. Int. J. Geogr. Inf. Sci. 2011, 25, 1773–1791. [Google Scholar] [CrossRef]
  2. Ricotta, C.; Carranza, M.L. Measuring scale-dependent landscape structure with Rao’s quadratic diversity. ISPRS Int. J. Geo-Inf. 2013, 2, 405–412. [Google Scholar] [CrossRef]
  3. Huang, X.; Li, J.; Liang, Y.; Wang, Z.; Guo, J.; Jiao, P. Spatial hidden Markov chain models for estimation of petroleum reservoir categorical variables. J. Petrol. Explor. Prod. Technol. 2016. [Google Scholar] [CrossRef]
  4. Li, W. Markov chain random fields for estimation of categorical variables. Math. Geol. 2007, 39, 321–335. [Google Scholar] [CrossRef]
  5. Li, W. A fixed-path Markov chain algorithm for conditional simulation of discrete spatial variables. Math. Geol. 2007, 39, 159–176. [Google Scholar] [CrossRef]
  6. Cao, G.; Kyriakidis, P.C.; Goodchild, M.F. A multinomial logistic mixed model for the prediction of categorical spatial data. Int. J. Geogr. Inf. Sci. 2011, 25, 2071–2086. [Google Scholar] [CrossRef]
  7. Krishnan, S. The tau model for data redundancy and information combination in earth sciences: Theory and application. Math. Geosci. 2008, 40, 705–727. [Google Scholar] [CrossRef]
  8. Polyakova, E.I.; Journel, A.G. The Nu expression for probabilistic data integration. Math. Geol. 2007, 39, 715–733. [Google Scholar] [CrossRef]
  9. Allard, D.; Comunian, A.; Renard, P. Probability aggregation methods in geoscience. Math. Geosci. 2012, 44, 545–581. [Google Scholar] [CrossRef]
  10. Genest, C.; Schervish, M.J. Modeling expert judgments for Bayesian updating. Ann. Stat. 1985, 13, 1198–1212. [Google Scholar] [CrossRef]
  11. Huang, X.; Wang, Z.; Guo, J. Prediction of categorical spatial data via Bayesian updating. Int. J. Geogr. Inf. Sci. 2016, 30, 1426–1449. [Google Scholar] [CrossRef]
  12. Li, W. Transiogram: A spatial relationship measure for categorical data. Int. J. Geogr. Inf. Sci. 2006, 20, 693–699. [Google Scholar]
  13. Stone, M. The opinion pool. Ann. Math. Stat. 1961, 32, 1339–1342. [Google Scholar] [CrossRef]
  14. Goovaerts, P. Geostatistics for Natural Resources Evaluation; Oxford University Press: New York, NY, USA, 1997. [Google Scholar]
Figure 1. Jura lithology data set with four classes.
Figure 1. Jura lithology data set with four classes.
Challenges 07 00021 g001
Figure 2. Lithofacies prediction of the corresponding 259 locations based on MAP probability criterion.
Figure 2. Lithofacies prediction of the corresponding 259 locations based on MAP probability criterion.
Challenges 07 00021 g002
Figure 3. Prediction accuracy comparison.
Figure 3. Prediction accuracy comparison.
Challenges 07 00021 g003
Table 1. Descriptive statistics of transition probabilities (expert opinions).
Table 1. Descriptive statistics of transition probabilities (expert opinions).
Transition ProbabilityMeanMedianMaximumMinimumStandard Deviation
P 11 0.2700.2011.0000.0000.250
P 12 0.2900.2920.6320.0000.173
P 13 0.2540.2390.7690.0000.140
P 14 0.1860.1930.4360.0000.123
P 21 0.2270.2390.5630.0000.134
P 22 0.3320.2821.0000.0910.198
P 23 0.2540.2560.4360.0000.090
P 24 0.1870.1920.3870.0000.088
P 31 0.2200.2000.6360.0000.141
P 32 0.3130.3091.0000.0000.155
P 33 0.2940.2621.0000.0000.203
P 34 0.1730.1850.4690.0000.119
P 41 0.2190.1820.7200.0000.154
P 42 0.3120.3180.5620.0000.142
P 43 0.2560.2380.5860.0000.123
P 44 0.2130.1841.0000.0000.208

Share and Cite

MDPI and ACS Style

Huang, X.; Wang, Z. A Linear Bayesian Updating Model for Probabilistic Spatial Classification. Challenges 2016, 7, 21. https://doi.org/10.3390/challe7020021

AMA Style

Huang X, Wang Z. A Linear Bayesian Updating Model for Probabilistic Spatial Classification. Challenges. 2016; 7(2):21. https://doi.org/10.3390/challe7020021

Chicago/Turabian Style

Huang, Xiang, and Zhizhong Wang. 2016. "A Linear Bayesian Updating Model for Probabilistic Spatial Classification" Challenges 7, no. 2: 21. https://doi.org/10.3390/challe7020021

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop