1. Introduction
Land-cover information is important for resource management and environmental modelling. A variety of land-cover information products (static and dynamic, crisp and soft) are generated from different sensor datasets at regional and global scales [
1,
2,
3,
4,
5,
6]. This research focuses on static land-cover information coded with discrete class labels rather than percent covers (or fractional covers or class proportions). However, land-cover information is always inaccurate to some extent. This is because information about land-cover status and dynamics is not directly measurable but results from complex processes of image and data analyses, interpretation, and reasoning, which are subject to various forms of uncertainty. There are increasing research efforts directed towards describing, quantifying, and analyzing accuracies (or misclassification errors) in land-cover information [
7,
8,
9,
10,
11,
12,
13].
Conventionally, classification accuracy is assessed based on error matrices constructed from certain reference or validation sample data. Various accuracy measures, such as percent correctly classified (PCC) pixels (also termed overall accuracy), producer’s accuracy, and user’s accuracy, can be computed from error matrices [
14,
15]. On the other hand, as increasingly recognized, local (per-pixel) accuracy should be analyzed and estimated so that users can better understand how misclassifications are related to characteristics of the landscapes being mapped and producers may pursue classifier improvements and information refinement. Spatial analyses, modelling, estimation, and applications concerning local accuracies in land-cover information are discussed by various authors [
16,
17,
18,
19,
20,
21,
22,
23,
24,
25,
26,
27,
28,
29,
30,
31,
32,
33,
34], as reviewed below.
Research on local accuracy has focused on two major inter-related aspects: (1) local accuracy characterization through spatial and statistical analyses of accuracy-context associations, and (2) local accuracy estimation which is usually based on sample data and empirically built accuracy models. Here, context, as a broadly defined term, includes map class labels, locations, and indices quantifying patterns of class occurrences, as is the case in this paper. It may also be defined in image data and feature space [
16]. Classes can refer to static land-cover types or their changes (e.g., forest loss and urban gain, as in [
17]), although this paper concerns the former case. We review related work on these two aspects below.
Research on analyses of accuracy-context relationships has found that informative contextual features (for explaining spatial variations in classification accuracy) include spatial heterogeneity, patch size, and other landscape pattern indices [
18,
19,
20]. Heterogeneity indicates textural complexity of land-cover classes occurring in certain neighborhoods and generally includes compositional (the number and proportions of different classes) and configurational (the spatial arrangement of classes) types [
21]. A few examples are as follows.
It was found that land-cover heterogeneity and patch size were important factors determining local accuracy for the United States National Land-Cover Data (NLCD) land-cover product [
18,
19]. Van Oort et al. established relationships between classification error and landscape characteristics, showing that the probability of correct classification decreases with higher focal heterogeneity (in a neighborhood of 3 by 3 pixels) and smaller patch size [
20]. Lechner et al. developed a statistical simulation model to test the effects of patch size and shape, classification threshold, and grid location on classification accuracy of small and linear features. They found that the patch size was an important factor affecting classification accuracy [
22]. Chen et al. analyzed and examined the relationships between accuracies of crop classification and area estimation and spatial heterogeneities, in particular, sample pixel impurity and landscape heterogeneity, and found that the impact of configurational heterogeneity on the area estimation was more significant than that of the compositional heterogeneity [
21]. As reviewed above, complex landscapes (as indicated by increased heterogeneity, decreased dominance, and smaller patch sizes, etc.) likely lead to more misclassifications. Clearly, misclassifications are also more likely with blurred remote-sensing images and lack of class separability in feature space (e.g., [
16]). Logistic models were usually used for describing statistical relationships between local accuracies and contextual/landscape patterns [
18,
19,
20]. There is also increasing research on local accuracy estimation (or prediction) as follows.
Various methods were explored for estimating local accuracies. These include empirical modelling (e.g., logistic regression [
17,
23,
24]), interpolation with inverse distance weighting (after computing accuracy measures based on locally constrained error matrices) [
25,
26], kernel functions [
24], estimation based on local error matrices that are constructed by geographic weightings [
27], kriging [
28], and logistic-regression-kriging [
29,
30]. Maps displaying estimated per-pixel accuracies, such as probabilities of correct classification (or misclassification), user’s accuracies (commission errors), and producer’s accuracies (omission errors), were also generated (see [
17,
23] for examples).
The aforementioned methods are mostly employed in the spatial domain (i.e., user’s domain), while some of them are also applicable in the spectral domain (i.e., producer’s domain) (e.g., kernel functions and logistic regression, as described in [
24]). A useful method for local accuracy estimation in the spectral domain is the so-called calibration method that seeks to transform various classification certainty measures, such as maximum posterior probabilities, which are computed as intermediate results prior to output of end results, to accuracy indicators [
31]. There was also research on local accuracy estimation in combined spectral and spatial domains. For example, Steele et al. [
28] formulated a concept of misclassification probability and present a resampling-based method of estimating misclassification probabilities at training sample locations, from which misclassification probability estimates are then interpolated to a lattice of points via kriging. Additional examples of combined spectral and spatial methods [
16,
29], in which spectral data and spectrally—derived soft class probabilities were used as the basis for modelling local accuracies, respectively.
As this research is oriented to local accuracy estimation in spatial domains, we elaborate on relevant (spatial—domain) methods, though most of them are mentioned above. A useful method is to compare map and reference class labels at certain sample locations so that a map of misclassifications can be created, helping to analyze their occurrences in the map being assessed. However, such an error location map does not show complete-coverage misclassifications over the problem domain. For mapping per-pixel accuracies, Foody [
25] proposed a method based on interpolating accuracy measures computed from locally constructed error matrices. This method relies heavily on availability of relatively dense sample data to work well (a sampling intensity of about 6% was employed [
25]). However, sampling intensities say 2.5% which are definitely affordable for small areas (e.g., [
24]) will become prohibitive for large-area assessments (e.g., [
17]). Developments on this method are reflected on geographical weighting and other extensions in local construction of error matrices [
23,
27,
32]. In addition to such methods making use of only locational information contained in the sample data, logistic modelling using contextual information (in addition to locational information) for estimating local accuracy may be usefully explored (as in this paper), given the observation that per-pixel probability of correct classification is closely related to contextual features characterizing patterns of map class occurrences in the neighborhoods [
18,
19,
20]. In fact, logistic regression was implemented in both geographic space (e.g., locations) [
24] and contextual feature space (e.g., contextual information about class occurrence patterns and landscape characteristics) [
17]. The fitted logistic models can then be used to estimate the per-pixel probabilities of correct classification and hence generate maps showing spatially varying accuracies. See the work by Wickham et al. [
17], Khatami et al. [
24] and Zhang and Mei [
30] for examples of using logistic models built on sample data and land-cover map data to estimate local accuracies.
Having provided some relatively solid justification for local accuracy estimation based on logistic modelling (which this research adopts), we consider issues of sampling (for collecting reference sample data), in particular, coupling of sampling designs and modelling approaches, below. The coupling of modelling and sampling facilitates integration of accuracy estimation and information refinement, with the latter using information about local accuracies in fusion of map and reference data for enhancing quality of fused maps [
33,
34]. This integrative framework actually represents the paper’s major contribution to the literature, as is seen below.
Like in error-matrix-based accuracy assessment, reference or validation sample data consisting of reference class labels (from which binary data indicating correct or incorrect classifications at sample pixels are obtained) are necessary for model-building. As understandable, models empirically built and model predictions are conditional to specific sample data employed (for model training), which are collected following certain sampling designs. It is thus important to reflect on how logistic modelling was implemented in combination with sampling in the past. The review below aims to provide a general indication to the largely loose coupling between modelling and sampling in existing literature, though it is by no mean comprehensive or detailed.
Smith et al. implemented logistic regression for characterizing local accuracy in the NLCD datasets in the eastern US encompassing four regions across 21 states with 5020 sample pixels (presumably with a region-stratified random sampling design) by a class-aggregated modelling strategy [
18], with models built for individual regions separately. Then, Smith et al. carried out logistic modelling of local accuracies by a (map) class-stratified modelling strategy (stratifications with map classes at both Levels I and II), using the same sample set (5020 sample pixels) [
19]. Based on a class-aggregated modelling strategy, Van Oort et al. used a sample set of 1161 grid cells (collected with a kind of near-systematic sub-sampling) to model and analyze the classification accuracy of agricultural crops in the Dutch national land-cover database [
20]. Based on a simple random sample data collected at a sampling intensity of about 5%, Zhang and Mei integrated logistic regression and geostatistics for local accuracy characterization in land-cover change information via class-aggregated modelling [
30]. With stratified random sample data collected at intensities of 0.5% and 2.5%, Khatami et al. compared logistic modelling with other modelling approaches for estimating local accuracies in classified remote-sensing images, with both class-aggregated and class-specific (i.e., class-stratified) modelling approaches considered in the spatial domain or spectral domain [
24]. It was confirmed that class-specific modelling provides more accurate estimation of local accuracies than class-aggregated modelling, as investigated in [
18,
19,
24].
As reviewed above, with reference sample data collected, logistic modelling can be performed in a (map) class-aggregated or class-stratified way. The latter is well suited to accommodating between-class inhomogeneity in accuracy-context relations, as demonstrated in [
19], and has been confirmed to be more accurate than the former [
24]. In addition to systematic sampling and random sampling (simple or stratified), which are among the commonly used sampling designs, sampling adaptive to local class heterogeneity (e.g., class impurity in a focal neighborhood of 3 by 3 pixels) was also explored for accuracy assessment [
35]. This is motivated by the observation that boundary areas (i.e., edge pixels) are more likely misclassified than inner areas (i.e., interior pixels), as amply demonstrated in the literature on local accuracy estimation [
35]. Based on sample data in which edge pixels and interior pixels were treated separately, accuracy assessment was carried out, showing large differences between classification accuracies in segments of edge pixels and those of interior pixels [
36].
Similar to the aforementioned error-matrix-based accuracy assessment, models of local accuracies may be built separately for contextually heterogeneous vs. homogeneous pixel segments (sub-strata) in individual strata of map classes, hopefully increasing accuracy in resultant model estimation. In other words, as an extension to class-stratified modelling, class-heterogeneity-stratified modelling can be usefully explored for proper handling of within-strata inhomogeneity in accuracy-context relations. This double-stratified method should also be considered for sampling pertaining to reference sample data collection so that sampling and modelling are well coupled with each other. More importantly, with this double-stratified method applied in sampling designs, heterogeneous sub-strata (which usually are more prone to misclassification than homogeneous sub-strata) are likely sampled at greater sampling intensities than with other designs without considering sub-stratification by heterogeneity. The increased number of sample pixels in error-prone locations will, in turn, enable detailed studies of misclassification patterns and facilitate direct correction of misclassification errors for refinement of land-cover information through fusion of map data and reference sample data. This helps to broaden usability of sample data for not only local accuracy estimation but also information refinement. Therefore, the aforementioned class-heterogeneity-stratified method for sampling and logistic regression modelling constitutes this paper’s major contribution to the literature. The main features and values of the proposed double-stratification method include a combined perspective of sampling and modelling (which were seldom treated coherently in the past) and an integrative construct for local accuracy characterization and information refinement.
As the first step towards building up the aforementioned integrative framework, this paper investigates performances of the proposed double-stratified method (featuring class-heterogeneity-stratification in both logistic modelling and sampling) in comparison with those of alternative methods (i.e., logistic regression modelling and sampling that are not class-heterogeneity-stratified). This is important as the proposed method needs to be proved competitive in terms of performance for local accuracy estimation at the first place to be worthy of being pursued further for information refinement. In addition to comparing the proposed and alternative methods’ performances based on a separate model-testing sample, these methods’ sensitivities to sample sizes were also analyzed, with their robustness to varying sample sizes examined. This (sensitivity analysis) actually represents another contribution of this research to the literature, as it was rarely considered in similar research. As shown in the case study, the proposed class-heterogeneity-stratified method generates significantly more accurate estimation of local accuracies than alternative methods including a double-stratification method with sub-stratification by edge vs. interior pixels (as described in [
36]), according to results of statistical testing and sensitivity analyses.
The remainder of the article is as follows. In
Section 2, the study area and data used in the research are described first, followed by descriptions of methods for sampling and logistic regression modelling, in particular, those with double stratifications by class and heterogeneity.
Section 3 describes the experiment carried out and the results obtained, aiming to test the proposed method in comparison with alternative methods. Finally,
Section 5 concludes the paper after discussing some issues in
Section 4.
4. Discussion
As shown in the results obtained in the study, the proposed class-heterogeneity-stratified method (applied for sampling and logistic modelling jointly) was confirmed to be the most accurate for estimating local accuracies in comparison with other methods. Sensitivity analyses also showed the proposed method’s effectiveness and robustness, confirming its fair level of reliability. This study has met its goal of testing the proposed method’s performance in local accuracy estimation, as the first step towards building an integrative framework for accuracy estimation and information refinement.
Below, some aspects of the work reported in the paper are reflected upon, with further work prospected briefly.
Firstly, in the paper, residuals of logistic regression predictions were not analyzed with respect to spatial correlation, nor was logistic-regression-kriging explored for mapping local accuracies as in [
30]. In the logistic model in Equation (2),
p(
x) represents the probability of correct classification (or agreement between map and reference class labels) at pixel
x.
p(
x) is actually the mean of a binary variable
I(
x) indicating if
x is correctly classified:
p(
x) =
E(
I(
x)). Logistic-regression-kriging can thus be viewed as kriging with local means to get estimation of
I(
x), with logistic regression predicting local means, while kriging transferring spatial information contained in residuals (i.e.,
I(
x) −
p(
x)) from sampled locations to unsampled ones [
46]. It (logistic-regression-kriging) certainly merits consideration for mapping local accuracies, especially when regression residuals are spatial correlated (hence should be incorporated for improved estimation of local accuracies). However, given the paper’s future orientation to information refinement (after local accuracy estimation), it makes sense to perform kriging based on land-cover data concerned directly (rather than indicator data representing classification correctness) when pursuing data fusion in the future. Another reason for having not pursued kriging in the paper is the extra computational cost that would be incurred by implementing kriging after logistic regression, since sensitivity analysis, as a relatively novel aspect of this study, was already computationally expensive.
Secondly, in this study, double-stratified modelling in combination with double-stratified sampling was confirmed to be the most accurate for local accuracy estimation, given same sample sizes. However, it is worth exploring how sampling may be optimally configured (beyond double stratifications) with respect to information refinement [
33,
34] (the top priority in the future). Related to this is the issue of how we may figure out the optimal sample size for a specific study area given the budgets for reference data collection. Furthermore, it is important to devise methods for combined use of all reference data available to improve accuracy characterization and data fusion, regardless of with what designs (which may be more complex than those in this study) they were originally collected. We acknowledge that there is great room for improvements of and extensions to the work done in the paper, being aware that sampling is itself a topic of breadth and depth.
Thirdly, some technical aspects are worth further explorations. On one hand, given the facts that double-stratified modelling tends to become complicated with much more models to build than CA modelling and that sub-stratified models are very much homogenized over corresponding data sub-strata, it seems sensible to explore possible simplification of sub-stratified models without significantly compromising accuracy of estimation (e.g., using sub-stratum means). On the other hand, it is worth exploring the double-stratification method in time. For this, it is interesting to investigate how the double-stratification method may be used to characterize per-pixel accuracies in land-cover change [
17,
30].
Fourthly, we discuss the issue concerning threshold selection for defining heterogeneous vs. homogeneous sub-strata, which are essential for the proposed Method EO. As described in
Section 2.2, the threshold for a sub-stratum being homogeneous was 4 pixels in a neighborhood of 3 by 3 pixels. This means that the class type of center pixel needs to be in majority (no less than 5/9) to claim it being in a relatively homogeneous neighborhood. Please note that homogeneity is defined as the number of pixels with the same class label as that of the center pixel in the focal neighborhood (see also
Table 2). Clearly, unlike first level stratification by map classes that are fixed for a given map, sub-stratification into heterogeneous vs. homogeneous pixels segments in a stratum can be made on a more adaptive basis, as threshold selection is obviously related to and varies by the land-cover mosaic (cover types, patch shape, and landscape texture, etc.) depicted in the map being assessed. By adaptively selecting thresholds of homogeneity in sub-stratification, we can optimize sub-stratification by optimal thresholding to maximize reliability in estimated local accuracies under the constraints of sampling intensity and sample size. This issue (threshold selection) is certainly worth exploring in future research.
Fifthly, we discuss potentially useful methods for per-pixel accuracy estimation in soft (subpixel) classifications [
2,
5]. Soft classifications are often considered as a kind of fuzzy classifications. In other words, “fuzzy” is more general than “soft” in conceptual terms, as the former refers to the cases whereby classes themselves are vaguely defined (e.g., the severity of drought). However, for soft classifications representing subpixel proportions of candidate classes in individual pixels, their probabilistic interpretations seem to be more relevant. With this understanding, we assume numerical equivalence (or similarity, more correctly) between subpixel class proportions and fuzzy membership values without causing confusion in the following discussion. Comber et al. [
32] represented one piece of pioneering work on per-pixel accuracy estimation for fuzzy/soft classifications, while Khatami et al. [
47] was more recent contribution to the relevant literature. For such maps, per-pixel accuracy measures were differences between map and reference membership values (or class proportions) for a candidate class (denoted
D) in [
32,
47] (absolute differences, |
D|, were used in the former). In the papers by Comber et al. [
32] and Khatami et al. [
47], spatial interpolation was applied to generate surfaces of weighted moving window means of
D’s at sample locations, where weights are computed with distance-based kernels in a similar manner to geographically weighted regression (GWR). Clearly, the proposed method is not directly applicable to estimating local accuracies in fuzzy maps. To make the proposed method applicable to fuzzy classifications, two extensions are required. One concerns adaptions to regression modelling, the other is related to how heterogeneity is defined on fuzzy maps to better facilitate double-stratification on such maps. Regression modelling needs to consider the fact that accuracy measures applied to fuzzy maps are no longer probabilistic but continuous-valued
D. Thus, regression analyses rather than logistic regression may be explored. See the work by Shortridge and Messina [
48] for an example of analyses of continuous-valued errors in Shuttle Radar Topography Mission (SRTM) DEM and their associations with globally available topographic and land-cover variables across a wide range of landscapes in the United States, although it was not about classifications
per se. On the other hand, definitions of heterogeneity vs. homogeneity should be reviewed in the context of fuzzy maps and their local accuracy modelling [
49]. It seems that continua of heterogeneity-homogeneity are closely related to class proportions (or fuzziness in class memberships), although relations are not yet well understood. Once heterogeneity is defined with proper thresholds, double-stratification may be implemented: sub-stratifications of heterogeneity vs. homogeneity are based on the thresholds chosen while strata of prototype map classes are based on alpha-cuts [
49,
50] or maximum membership values (or dominant classes’ proportions) [
51].
Lastly but not the least, it should be recognized that there is issue of uncertainty related to estimated local accuracies. This is so because the models obtained (i.e., significant explanatory variables and model parameters) were conditional to the specific sample data given, as shown in
Table 4, even to sample data with same sampling design and sample size (
Table 7). Reference sample data quality [
52] is also a factor, although reference data in this research were assumed to be accurate. More importantly, the explanatory variables used in this research were derived from map data that were known to be contaminated with unavoidable misclassification errors. This means that estimated local accuracies were subject to two-levels of uncertainties propagating: from map data to explanatory variables (i.e., contextual features or landscape pattern indices computed from a land-cover map being assessed) and from explanatory variables to logistic-modelling-based estimation. Local accuracy estimation (reported in the paper and elsewhere) is, thus, by no means perfect, no matter how sophisticated the methods employed are. It is important to develop and promote methods that not only depict spatially varying accuracies in land-cover information products but also support uncertainty analyses in predicted per-pixel accuracies.
We discuss further the aforementioned two-level uncertainties in the remainder of this section. As mentioned above, unless field-measured data that are sufficiently accurate are used [
53], spatial analyses and modelling based on remote-sensing data and land-cover information estimated from them are subject to uncertainty. There is impressive literature on uncertainty in landscape pattern indices (or metrics) and analyses due to misclassification errors in land-cover maps [
54,
55,
56,
57,
58,
59]. As landscape pattern indices were used as explanatory variables for logistic modelling of accuracy in this paper, existing methods in the literature listed above may be usefully explored for analyzing sensitivities of relevant pattern indices to misclassification errors.
However, we need to go further to analyze and quantify uncertainty in estimated local accuracies using map-data-derived pattern indices in future research. Relevant literature is rather limited, especially with respect to uncertainty in logistic-modelling-based accuracy estimation. Nevertheless, literature on error-in-variables in regression analysis may shed light on issues of two-level uncertainties, while simulation-based error modelling is well worth exploring as another promising methodology. Regarding regression modelling considering error-in-variables, Zhang et al. [
60] and Fu et al. [
61]) addressed error-in-variables issue in the context of forest inventory using linear regression analyses based on remote-sensing data that are known to suffer from errors. Literature on error-in-variables in the context of logistic regression is more relevant to furthering this research, as logistic regression is designed for binary response variables (e.g., agreement/disagreement between map and reference labels). The work by Carroll and Wand [
62] and Yi et al. [
63] may serve as good starting points for further research. On the other hand, simulation-based approaches (e.g., [
59]) also merit consideration. We can simulate (land-cover) maps containing misclassification errors. These simulated maps can be used to generate a large sample of maps showing estimated local accuracies. Statistical summary and analyses on these maps of local accuracies can provide useful information about the effects of map inaccuracy on resultant per-pixel accuracy estimation, supporting uncertainty-informed local accuracy estimation and information refinement. Simulation-based methods are necessarily adapted to facilitate conditioning to reference sample data, to accommodate spatial correlation in misclassification errors in land-cover maps being assessed, and to promote mechanism-based uncertainty analyses [
64].