1. Introduction
In remote sensing and GIScience, research on errors in geospatial information and analyses aims to describe, model, propagate, visualize, and manage errors systematically through suitable metrics and methods [
1,
2,
3,
4,
5,
6,
7,
8,
9,
10]. For positional errors, in particular, there has been an impressive accumulation of work investigating the aforementioned topics [
11,
12,
13,
14,
15,
16,
17,
18,
19,
20,
21,
22,
23,
24,
25,
26,
27,
28,
29,
30,
31]. With the advent of high spatial resolution (HSR) remote-sensing images, the emerging technology of the generalized point cloud dataset [
32], and growing demands for high-accuracy geospatial information, research on positional errors in images and geospatial data and their impacts upon image co-registration and other applications is becoming increasingly important, as reflected in relevant literature [
33,
34,
35,
36,
37,
38,
39,
40,
41,
42,
43,
44].
This research focuses on modeling positional errors in remote-sensing images, especially those of HSR, to facilitate further studies on uncertainty in image-based information extraction and applications. Positional errors refer to the differences between measured and reference (assumed true) coordinates of the objects concerned (i.e., image pixels, in the context of this research). They are bivariate in X and Y coordinates for horizontally positioned points. A closer look into related issues is carried out.
Well-known positional-error descriptors include error ellipses for points and epsilon error bands for lines [
20,
25,
30]. There has also been work on local descriptors of positional errors in 2D [
45] and 3D [
46]. However, such error descriptors do not lend themselves to modeling spatial uncertainty in spatial queries and analyses involving two or more points. This is due to the presence of spatial correlation in positional errors, which precludes the simple extension of error ellipses to modeling errors in lines (e.g., road centerlines) and areas (e.g., land parcels) by applying the law of variance and covariance propagation [
10]. A more sensible method is stochastic simulation, whereby equal-probable realizations of the underlying regionalized variables (a well-known geostatistical term) are generated to facilitate error propagation in spatial analyses and applications [
4,
7,
12]. This method is, theoretically, more rigorous than methods that assume independent and identically distributed positional errors [
47,
48,
49,
50,
51].
Variogram-based geostatistical simulation is conventionally used for the stochastic simulation of regionalized variables, such as positional errors [
52]. This is non-trivial if co-simulation for bivariate positional errors (X and Y coordinates) is to be carried out, as auto- and cross-covariance for errors in X and Y coordinates need to be modeled.
Multi-point geostatistics (MPS) is better suited for simulating regionalized variables with complex patterns and for use in various applications [
53,
54,
55,
56,
57]. Instead of variogram models, MPS uses training images (TIs) as templates of spatial structures that are deemed representative of the problem domains. MPS produces conditional realizations honoring the high-order statistics, as opposed to traditional two-point statistics (e.g., variograms), represented in univariate or multivariate TIs. As an important technique in MPS, the direct sampling (DS) algorithm has advantages for modeling both categorical and continuous variables and is able to handle multivariate co-simulation [
58]. For DS, both regularly spaced and completely informed TIs and irregularly sampled and incompletely informed training data (TD) can be used for simulation [
59]. As a refinement to the conventional DS algorithm, QuickSampling (QS) [
60] is advantageous in terms of the improved computational efficiency and flexibility of using TI or TD. To the best of our knowledge, MPS in general and DS in particular have rarely been pursued for positional-error modeling. This research seeks to fill this important niche.
To proceed with MPS-based simulation, there are two hurdles to overcome. One concerns the handling of non-stationarity in positional errors in images, and the other is about a cost-effective strategy for constructing TIs or TD. These issues are discussed below.
A key feature for positional errors in images is their systematic components (i.e., trends or local means). The trends are often major components, which result from image tilts or satellite attitude oscillations [
61,
62], terrain relief [
37,
40,
63], or surface undulation [
46,
64] and are likely to be present even after initial image georectification, as discussed in [
65], where image vendor-supplied sensor models in the form of rational polynomial coefficients (RPCs) were not sufficiently accurate. For images flown on different sensors and over different areas, positional errors and their trends are bound to be different (i.e., non-stationary), precluding the transferability of TD built on particular types of images over specific training areas to different areas, even when the same types of sensors/images are used. Thus, trends need to be accounted for in positional-error modeling so that de-trended errors (i.e., residuals) may become stationary in terms of spatial statistics and, thus, can be effectively used for MPS.
Trend-surface analysis may be performed using various methods, including polynomials and thin-plate smoothing splines [
66,
67,
68]. In the geolocation and co-registration of satellite images [
69], thin-plate splines, as special types of radial basis functions, are often used [
70,
71]. In this research, thin-plate splines were employed to decompose positional-error fields into trends and residuals. This points to the methodology of generalized additive modeling (GAM) [
72,
73], which (being a machine learning method) is a generalized linear model with a linear predictor involving a sum of smooth functions, including thin-plate smoothing splines. GAM was well-suited for this research, as it can be implemented either via smoothing splines or partial splines incorporating extra explanatory variables [
67].
As mentioned above, the other issue concerns TD, which need to be furnished for MPS simulation. Note that TD for residuals is required here, since de-trending of positional errors is necessary, as discussed above. Manual image digitization for densely sampled TD is time-consuming and may be feasible only for very small study sites. Relatively dense TD may be cheaply obtained by using the technique of digital-image matching [
74,
75,
76,
77,
78,
79,
80], given the existence of reference images. However, this technique may not be easily used for HSR images that feature complicated urban fabrics, even when they are initially georectified, as was the case for this study. This is because the so-called orthoimages that are routinely produced are often only georectified using digital elevation models (DEMs) rather than digital surface models (DSMs) and, thus, they are not truly orthorectified images—especially those in the presence of non-terrain 3D spatial entities (e.g., buildings) [
33,
40,
81,
82]. Image displacements due to surface undulation are often complicated by image occlusions. In turn, these occlusions give rise to ambiguities and errors in the identification of homologous image points through computer image matching and, hence, errors in the derivation of accurate image displacements, not to mention other kinds of complexity, such as shadows and unwanted objects in images and the effects of temporality between the reference images and the test images.
Clearly, the combined use of automatic image correlation and visual screening (for raw displacement data cleaning) provides a feasible solution in generating relatively dense and quality-enhanced TD. Therefore, the strategy in this study was to furnish relatively dense TD by using digital-image correlation to identify homologous image points and to measure positional errors. This was followed by visually screening raw displacement data to filter out gross errors that were due to the misidentification of homologous points and the contamination of non-terrain image objects, which are common in HSR images.
The main contribution of this research lies in the novel use of MPS (DS in particular), GAM, and digital-image correlation to characterize positional errors in HSR images, where DS functions as a non-parametric and multivariate simulator for positional errors, GAM de-trends positional errors, and digital-image correction (followed by visual screening) constructs relatively dense and quality-enhanced TD. Simulated positional errors will facilitate error propagation in applications, such as the extraction of road centerlines from images. In addition, with mean errors in positions computed from error realizations, reference positions for positions of interest can be estimated through error correction, providing a method of enhanced georectification.
2. Materials and Methods
The flowchart for this research on positional-error characterization is shown in
Figure 1, in which a set of homologous image points are manually digitized, with errors in the X and Y coordinates calculated, resulting in reference sample data, from which reference values of positional errors were derived. The reference dataset consisted of a reference sample of 581 points (for model training in GAM and as CD in MPS) and a test sample of 60 points. Image correlation was applied to generate relatively dense raw measurements of positional errors (i.e., raw displacements), which were then screened to filter out erroneous points, resulting in quality-enhanced displacement data, a subset of which were used as TD in this research. GAM was used to extract trend surfaces and, thus, residuals for both CD and TD. Using de-trended TD and CD, realizations of residuals and, hence, positional errors were generated at a grid of 80 m resolution. The metrics of positional errors (i.e., means, standard deviation, and cross-correlation) were then generated by summarizing these positional-error realizations. Realizations of errors along road centerlines (RCLs) were obtained by interpolation over the aforementioned error surfaces that were simulated, while the error metrics for RCLs could be computed and recorded as extra line attributes. Mean reference positions for RCLs could also be estimated from their digitized versions on the test image and from positional errors derived from DS simulation through error correction. Using test sample data, the simulated reference RCLs could be assessed with respect to their accuracy. A further description of these methods is provided below.
2.1. The Study Area and Datasets
This research was based in the Shanghai municipality, China. Shanghai is a major coastal city, being China’s commercial, financial, industrial, and trading center. Its terrain undulation is modest, suggesting little impact of terrain relief therein on image displacements. A square area of 40 km by 40 km was chosen as the study site, as shown in
Figure 2.
A georectified image subset (panchromatic) of 0.5 m spatial resolution, flown during the summer in 2020, was used as the reference image, as shown in
Figure 2a. A ZY-3 satellite image subset (panchromatic band), acquired on 21 February 2020 and resampled to 2.0 m resolution, was used as the experiment test image (often known as the target image or the sensed image in relevant literature), as shown in
Figure 2b. The images were initially georectified and radiometrically corrected, being part of the data sources for building up the generalization point cloud dataset described in [
32].
A total of 641 homologous image points were visually located and digitized on the reference-test image pair, 581 of which (shown as green dots in
Figure 2a) were used as CD in MPS and also for model training in GAM and variogram modeling, while 60 of which (shown as red dots in
Figure 2b) were used as test sample data. The model-training sample pixels seemed to be regularly distributed, as shown in
Figure 2a. In fact, they were only approximately so. In principle, any well-defined image pixels can be candidates for training pixels. However, our preliminary tests revealed that irregularly distributed training pixels were advantageous for effectively de-trending positional errors to enhance stationarity in resultant residuals, thereby facilitating effective DS-based simulation through constructing TD from a sub-area rather than the whole study area. Major roads are also depicted in
Figure 2b; they are are used to illustrate error propagation and positional-error correction. For ensuring accuracy in the reference sample data (of positional errors), centers of low houses (mostly in the rural areas), centers of landmarks, and road intersections were measured, by referring to the generalized point cloud dataset [
32] and OSM road network data. Note that the use of the generalized point cloud dataset was limited, in the sense that only browsing was permitted, due to its being considered as classified data at this stage.
The means and the standard deviation for positional errors in the X and Y coordinates in the experiment reference-test image pairs are reported in
Table 1; they were estimated based on the model-training data (i.e., the CD) and the test sample data, respectively. As indicated in
Table 1, the metrics of positional errors were mostly similar between those of the training sample data and the test sample data, except for the minimum error in X and the standard deviation in Y
2.2. Semi-Automatic Construction of TD
As described in the Introduction, a combination of image correlation (for collecting raw displacement data) and visual screening provided a feasible solution for obtaining relatively dense and reliable TD for MPS.
By digital image correlation, homologous image points in a reference-test image pair were identified using image-matching algorithms, for which a formula for matching cost is provided in the
Appendix A. This allowed for the differences between the reference image and the test image (i.e., displacements or positional errors of points in the test image relative to the assumed reference image) to be estimated automatically. The software package MicMac (version 1.0. beta14) [
78] was used for automatically measuring displacements of features on the test image with respect to the reference image.
To filter out gross errors in the aforementioned raw displacement data, it was necessary to select well-defined, easy-to-locate, visible (on both reference and test images), and ground-level entities that were free of gross errors that were due to surface undulation displacements, image point mismatches, and other artifacts. This implied that image segments over water bodies, building sides, wooded land, shadows, and other areas of uncertainty needed to be masked out. This process was assisted in this study by using the generalized point cloud dataset [
32] and OSM road network data with the understanding that road surfaces visible on images were not subject to displacements due to terrain relief when the experiment images were initially georectified.
2.3. GAM for De-Trending Positional Errors
As mentioned in the Introduction, systematic components are usually present and often of major proportion in positional errors, even if they are initially georectified. We needed to de-trend positional errors (for both TD and CD) as a way to handle non-stationarity [
54,
56] for geostatistical simulation-based error modeling.
For this research, the R-package Mixed GAM Computation Vehicle with Automatic Smoothness Estimation (MGCV) [
73] was used for GAM-based trend-surfacing in CD and TD. In MGCV, GAMs (in particular, thin-plate splines) attempt to find the appropriate smoothness for each applicable model term, using prediction-error criteria or likelihood-based methods. The prediction-error criteria used are those of generalized cross validation (GCV) when the scale parameter is unknown. Clearly, GAM estimates the degree of smoothness as part of model fitting [
72]. In the
Appendix A, thin-plate splines are described.
With GAMs fitted to the data, trend surfaces were predicted over the study area and over the extent of the TD, respectively. Then, residuals were computed at locations of CD and TD, accordingly. De-trended CD and TD were used in DS, as discussed below.
2.4. DS for Simulating Positional Errors
DS was used to simulate a large number of equal-probable residual simulations (and then positional errors). DS is a conceptually simple, yet functionally powerful, MPS algorithm, as it directly samples the TD for a given data event instead of counting and storing the configurations found in Tis, as in some other MPS algorithms [
53].
DS proceeds as follows. Starting from an initial randomly located point along a chosen random path, the TD are scanned. For each of the successive sampling window, the mismatch metric (distance metric) between the data event informed in the SG and the mismatch metric sampled from the TD are calculated. If the mismatch is lower than a given threshold, the sampling process is stopped and the value at the central node of the data event in the TD is directly taken as the simulated value at the SG node under consideration [
58,
59]. The mismatch metrics used in DS are described in the
Appendix A.
An MPS framework, G2S [
60], was used in this research. We applied a resolution of 80 m for gridding TD and for specifying SG nodes. In this way, matching between data events in TD and SG, with a geometric tolerance of 40 m, was implicitly enforced. DS was run using the multivariate approach, as positional errors are bivariate (with X and Y coordinates). Equal weighting was applied to the X and Y coordinates.
The simulated error surfaces (100 surfaces) were summarized per individual SG nodes as maps of means, standard deviation, and covariance (between X and Y) of residuals. The estimated trend surface estimated was added to the mean residuals to obtain a surface of mean positional errors, while the maps of standard deviation and X–Y covariance in positional errors were equal to those of residuals.
It is useful to view these surfaces of positional-error metrics as imaginary fields of corresponding quantities over the image domains—even over image segments in which the objects of interest are not visible, as is often the case with HSR images of urban scenes. This is because images are often not truly orthorectified, as explained in
Section 1.
2.5. Positional-Error Propagation in Road Centerlines
In this research, error propagation aimed to simulate errors at vertices defining a set of RCLs that was chosen for experiments. With a number of equal-probable realizations of residuals simulated, as described in
Section 2.4, it was possible to obtain realizations of positional-error residuals at RCL vertices (RCLVs, as shown in
Figure 1) by interpolation, as an approximate approach (a theoretically rigorous way would be simulation at RCLVs directly). The simulations at RCLVs could then be summarized to obtain the metrics of the residuals of the positional errors therein.
For simulating positional errors at RCLVs, we estimated trend values of positional errors at the vertices by interpolating on the trend surfaces estimated separately (see
Section 2.3). Positional-error realizations at the vertices were obtained by adding the estimated trend-surface values to residual realizations. The mean positional errors were the sum of mean residuals and trend-surface values, while the standard deviation and the X-Y covariance at individual vertices remained the same as those in the residuals.
RCLs were then simulated. In addition, the lengths of some of the RCL segments were summarized in terms of means and standard deviation.
2.6. Realizations of References Positions for Road Centerlines
As described in
Section 2.5, positional errors at SG nodes and RCLVs can be simulated in a large number of equal-probable realizations. Simulated positional errors can also be used to simulate reference positions of points of interest. This amounts, effectively, to enhanced georectification.
For this purpose, we performed georectification to some of the selected RCL segments, where reference data were available (so that accuracy gains could be evaluated) over the test image, through error correction. Alternatively, using CD as control data, we also performed image rectification to the test image, based on the triangulated irregular network (TIN).