1. Introduction
Permanent scatterer interferometry (PSInSAR) [
2] is an advanced multi-baseline SAR interferometry approach used to estimate surface motion differences between temporarily stable and coherent points, known as permanent scatterers (PS). PSInSAR is probably the most well known multi-baseline SAR interferometry approach, but it is certainly not the only one. A variety of approaches, such as SBAS [
3], StaMPS [
4], STUN [
5], or PSP [
6], as well as extensions, such as SqueeSAR [
7], QPS [
8], and many more, have been developed over the years.
PSInSAR and similar methods are known for their high precision in estimating surface motion. Reported values of standard deviations of errors in the estimated linear velocity are given with 0.4 mm/yr to 0.5
9]. Sub-millimeter accuracy and a standard deviation of errors of
was reported in an experiment with TerraSAR-X data [
10]. About 3
was reported by [
According to [
12], the predicted absolute standard deviation for an ideal scatterer is
N is the number of images, and the results are given in
. The best measured
was with
in the experiments of Adam et al. [
12]. Therefore, in our experiments with a stack of 32 images in Wuhan and a stack of 51 images in Naples, Italy, a standard deviation of the error of about
for Wuhan and
for Naples would be expected for (unrealistic) ideal scatterers.
Assuming less-ideal conditions, for time series, errors in the range of 1 mm/yr to 2 / are to be expected. Although we are not challenging these numbers, we suggest caution in over-interpreting these results, especially when analyzing a single PS instead of interpreting the results in a wider, regional context. Errors can exceed these values, and we will show differences in the estimated velocity values exceeding these expectations by changing only minor parameters.
Considering the large variety of processing methods, it must be recognized that each method typically yields different results. This is not surprising, as these methods have been developed to address shortcomings in previous methods and are intended to derive different, somewhat better results. Therefore, it is no surprise that different models and methods lead to different results, with differences that can far exceed several in velocity.
However, these differences have seldom been discussed in the literature. One exemption is the Milillo et al. versus Lanari et al. discussion [
15]. The general trends of surface motion patterns are normally similar in different methods.The differences in the details can be substantial though. Differences can be large, especially if unsuitable models are used, for example, models that estimate linear velocities [
16] for nonlinear deformations [
17]. However, considering these differences, the often-cited accuracies of the multi-temporal baseline methods are certainly overstated, especially considering that not all processing will be conducted in an optimal manner because the true nature of deformation patterns is unknown; if they were known, we would not need to use PSInSAR to begin with.
In this study, we are looking at even smaller differences: differences in the final results caused by small changes in the parameters used in PSInSAR processing. Our main focus is on the boundary parameters describing the estimation range for the velocity and residual height estimations in the periodogram. These differences are generally small but can cause unwrapping errors that affect the majority of the points, as we will show.
In addition, a typical quality control method using temporal coherence is not directly suitable for addressing these issues. Temporal coherence is a measure of the fitness of observations to a given model. However, each set of parameters constitutes a different model, and a fitness to Model A is not directly comparable to that to Model B. This is often misunderstood when using PSInSAR.
In general, these problems are well known by many practitioners in PSInSAR, but are not well communicated in the literature. The advice often given to beginners is to ‘play around’ with the parameters and test different sets of parameters, i.e., models, until a good set of parameters is found. The definition of ‘good’ in this case is typically not based on any value, although sometimes temporal coherence is wrongly used, but rather based on a visual impression of a meaningful and noise-free result.
Such practice has long been considered insufficient by the authors, as it lacks scientific rigor and reproducibility, and often the precise model parameters are not fully published in the literature. In this technical note, we are looking at the problem and describing it clearly, but we are not aiming to solve the problem. The goal of this technical note is not to solve the issue, but to raise awareness and caution restraint in interpretation and avoid over-promising accuracies that often cannot be met in practice.
3. Results
To illustrate the issue, we can compare the results of the PSInSAR processing using parameter estimation ranges of
Figure 2 and
Figure 3 for the test sites in Wuhan and Naples, respectively. The differences are small. The general trend is similar in these examples, but not all points show identical velocities.
For a more detailed view, we show the final results of the processing in our test areas with different parameters in
Figure 4 and
Figure 5. In
Figure 4 and
Figure 5, the results are shown in a matrix with different parameters for the residual height in the columns and the range of estimated velocities in the rows. We selected four parameters for the velocity—
, and
—and four parameters for the residual height—
, and
. These parameters are based on previous experiments in the Wuhan area [
20] that show estimated velocities below 40
. The building heights in both areas are normally below 50
, with a couple of buildings in Wuhan exceeding 50
but remaining below 100
As shown in
Figure 4 and
Figure 5, the overall deformation trend is similar in the 16 images shown for each test area. However, looking closely, differences can be observed between them.
We selected four parameter settings for the estimation of the velocity: ±40 /, ±60 /, ±80 /, and ±100 /. With the surface velocity below 40 /, all four parameters should find the estimate with similar velocities. For the estimation of the residual height, we selected four parameters: ±50 , ±100 , ±150 , and ±200 . Most buildings, especially in the Naples area, will be below 50 height. In the Wuhan test area, very few buildings are above 100 height and none is above 150 in the given area. Therefore, the vast majority of PS points should also be estimated correctly for the residual height given these parameters, especially considering that most PS on buildings are on the building facades, with only few on or near the roofs.
From the four parameter in the estimation of the velocity and the four parameter for the estimation of the residual height, we form a matrix of 16 parameters, which allows us to follow the effect of changing velocity and residual height parameters, as well as analyzing the effects of only changing one of the parameters.
The results shown in
Figure 4 and
Figure 5 are rather atypical, as we show the unfiltered results, including all PS candidates. Typically, the results are filtered based on the temporal coherence, reducing noise and outliers by only including PS points that show a good fit to the model. A fixed threshold for the temporal coherence is used for the filter. In our examples, we will use a threshold of ≥0.8 for the temporal coherence, which is a value often used and, as we will show below, is also a suitable threshold for our examples.
We can analyze the suitability of the threshold in outlier removal if we look at the relationship between the temporal coherence and the estimated velocity in
Figure 6.
As the area of interest is fairly stable, we would expect to see most points centered around
velocity. This is the case for points with high temporal coherence, that is, points that fit well into the model. We can see that we have more such points in Naples, shown on the right side of
Figure 6. The stack on Wuhan seems a bit more noisy. From the scatter plot, we can also derive a good threshold for the temporal coherence. In this example, although most outliers were filtered out with a threshold of 0.8, a higher threshold would be slightly more appropriate for Wuhan, while a threshold even slightly below 0.8 would still be suitable for the stack in Naples.
Looking at the entire set of points, we see grouping in several blocks. The noisy points were not evenly and randomly spread with respect to the velocity, but they formed blocks. This indicates that the error is generally caused by phase unwrapping errors, which lead to jumps in the estimated velocities and cause this error pattern.
Figure 7 and
Figure 8 we examine the scatter plots for all tested parameters for Wuhan and Naples.
As shown in
Figure 7 and
Figure 8, the patterns are similar for different parameters but not identical. The patterns are slightly changing, and we can also observe a shift in the average temporal coherence between the examples, leading to differences in the appropriate thresholds for the temporal coherence filter. To analyze the changes in the average temporal coherence, we can see
Table 1 and
Table 2.
We expect an increase in the average temporal coherence with an increase in the parameter range in our experiment. As the estimator in Equation (
5) maximizes temporal coherence, a wider parameter range should lead to a higher temporal coherence if a better fit has been found. Alternatively, we would expect an identical temporal coherence for identical parameter estimations. This pattern holds mostly true in the example of Naples shown in
Table 2, although the temporal coherence, e.g., for
, is significantly below the surrounding temporal coherences.
In our experiment in Wuhan, shown in
Table 1, we found a decrease in the average temporal coherence for velocity ranges above
and for residual height ranges above
in Wuhan. Given Equation (
5), how is this reduction in mean temporal coherence possible? The answer lies in the APS estimation. Equation (
5) is not only used to estimate the differences in the estimated velocity and the residual height between the reference point and all PS, but also along each edge of the sparse PSC network used to estimate the APS. Differences and errors in the parameter estimation lead to differences in the estimated APS. Errors in APS lead to errors in the final processing, which can lead to reduced temporal coherence for the final estimation at each PS point.
Therefore, a reduction in the mean temporal coherence with a widening parameter range is a good indicator of an error. This information can be used for better parameter estimation. Nevertheless, care must be taken in the interpretation of temporal coherence, as it is a measurement of fit to a model. A comparison across models has to be undertaken with great care. A lower mean temporal coherence for a smaller parameter range or a model with less degrees of freedom is to be expected and is not necessarily a sign of that model being less accurate or less appropriate.
For our next analysis, we form a vector of velocity values for each PS point for each of the 16 parameter ranges tested. In
Figure 9 and
Figure 10, we show the range of velocities, i.e., the maximum velocity in a given vector minus the minimum velocity in a given vector. This is shown for all PS and not filtered with the temporal coherence on the left side of
Figure 9 and
Figure 10, while on the right side of
Figure 9 and
Figure 10, results with a temporal coherence below 0.8 are not being used for calculating the value range. A detailed overview over the respective number of points in each class is given in
Table 3 and
Table 4.
As can be seen from the results on the left side of
Figure 9 and
Figure 10, a good portion of the PS is affected by outliers in at least one of the 16 parameters estimated. About 33% of all PS points in the Wuhan example show a difference of above 5
between the smallest and largest estimated velocities within the 16 parameter ranges used. For the Naples example, these are just 5.8% of the points, but about 28.6% of the points have a velocity difference of above 3
, as shown in
Table 3 and
Table 4.
As shown in
Figure 7, errors caused by unfit parameter ranges typically appear in the form of unwrapping errors, that is, rather large jumps in the results. This makes these errors large but also easily detectable as outliers. Therefore, it is typically relatively easy to filter out such errors, e.g., by temporal coherence filtering, as shown on the right side of
Figure 9 in Wuhan, but less so in
Figure 10 showing the results from Naples.
In the example of Naples, a good number of points show large differences in the velocities estimated with different parameters even after parameter ranges with a low temporal coherence have been filtered out, as can be seen on the right side of
Figure 10. Errors above 3
are common, representing about 23% of all points, and they appear especially at the edges of the test area in Naples.
In Wuhan, as shown in
Figure 9, after filtering, the majority of points show small differences between parameter ranges, with a vast majority of points, i.e., 98.9%, having differences below 1
after filtering by the temporal coherence.
Figure 9 is even a bit misleading, giving the impression of more points having differences above 1
. If we compare the numbers in
Table 3 and
Table 4, we see that in the case of Wuhan a temporal coherence filter is filtering the outliers effectively, to the point that most PS points with large outliers in the unfiltered example, are completely filtered out but the temporal coherence filter, which is to say that for these points the temporal coherence is below 0.8 for all of the 16 tested parameters, so that they completely disappear, leading to a reduction from 57,919 points to only 41,090 points after filtering.
This does not happen in the case of Naples, where 94.1% of the points remain with at least one parameter above the temporal coherence threshold of 0.8. The temporal coherence filtering is much less effective in the case of Naples, with a significant number of points showing large differences between the processing with different parameter bounds. This is even more astonishing, as looking at the different relation of temporal coherence and velocity estimation in, e.g.,
Figure 6 would indicate that the Naples dataset would have less outliers when filtering with the temporal coherence, while the Wuhan test data seemed more noisy.
We can see significant differences between PSInSAR results, even when just slightly changing small parameters bounding the parameter space. These differences can sustain, even after post-processing, which indicates that some of the reported results may only have been achieved after extensive parameter adjustment; hence our call for including these parameters into publications.
The spatial pattern of the larger differences in the example of Naples (
Figure 10) indicates that the main cause of the widespread errors is from the APS estimation. With significant differences in the parameter estimation along the edges, the estimated APS will differ when processing with different parameters. These differences in the APS will affect all points in the final process, but due to the unwrapping along the edges and the resulting possibility of error propagation, the edges of the processed scene are more affected than the center.
Another topic of this technical note is the limited suitability of temporal coherence for comparing different parameter ranges. We demonstrated with the example of Naples in
Figure 10 that even after filtering with temporal coherence, large errors can remain.
4. Discussion
Although the observed differences are small, they exceed expectations based on previously published results (e.g., [
9]). For many applications though, that does not matter much, especially if a proper post-processing filter is applied. However, we looked at only one set of parameters and probably the least significant parameter in the PSInSAR processing chain. We demonstrated the relatively large effect of these parameters on the final results. The observed that differences in velocities reach or exceed previously published error ranges, especially towards the edges of the processed area. The main question this raises is: If these rather insignificant parameters already have such an effect, what about the effects of different models, methods, approaches, and techniques on multi-baseline SAR interferometry?
It can put into question how previous results have been achieved. How much parameter tuning was involved, and how little of this was presented and discussed in the publications? This shows that without the full parameter sets being published, reproducibility cannot be achieved. For the same dataset, differences caused by only one parameter setting reach 1 mm/yr to 2 / for many points in our example in Wuhan far above that in the example of Naples. However, in previously published works, the comparisons between different approaches reach an agreement within a few millimeters. We assume that this has been achieved by carefully adjusting the parameters.
Adjusting the parameters for a good fit is common practice and not in itself a problem. However, we believe that this should be more clearly discussed and published. The discussion on the differences between the models and approaches is limited, and the focus is often on promoting the high accuracy of the methods, while the large differences between the models and the parameter settings are not openly discussed.
There is a spirit of ‘promoting’ multi-baseline InSAR techniques. After more than 20 years of successful applications, we do not think much more promotion is necessary. Practitioners and agencies that are using InSAR often, use it for wide-area surveillance and identifying zones of subsidence risk. They often rely on further ground measurements for final decision making. This is a reasonable approach because many are well aware that the promised millimeter or sub-millimeter accuracies are theoretical. In practice, we can identify subsidence and uplift zones with different D-InSAR methods, but over- or under-estimation of subsidence velocities is fairly common.
Our results are not surprising for these practitioners. We want others to understand the limitations of these techniques and avoid over-interpretation of the results. If the parameter settings change the results by millimeters, one should understand that. Very small motions, for example, / should be interpreted with care, as they might be in that error range. To make this clear, very careful PSInSAR processing can achieve higher accuracies, but misinterpretation and over-interpretation of results are also very common in the scientific discourse.
Our results also contribute to the discussion of APS filtering. In PSInSAR, an APS filter is applied after unwrapping along the edges [
2]; other methods, such as STUN [
5] or PSP [
6] do not. The problem we describe here is mostly caused by differences in unwrapping along the edges, leading to differences in the APS estimation, and finally causing errors for points that may not themselves be unstable. Thus, methods that avoid APS estimation may be advantageous.
However, parameter estimation errors in the network in these methods may cause even more differences owing to the error propagation along the network. Filtering the parameter estimation errors during the low- and high-pass filtering in the APS estimation can reduce their effects. APS filtering and re-calculation of the parameters from a single reference point also reduced the mean temporal coherence for the parameter ranges that were too wide in our experiments. With different estimation approaches, we may not be able to identify this reduction, and the post-processing filter may be less effective.
In this technical note, we point toward a technical issue within PSInSAR processing, but do not offer a solution. We want to raise awareness. Nevertheless, based on our results, some improvements can be made during processing. First, we suggest taking a look at the data, for example, by preparing a scatter plot of the temporal coherence with respect to velocity, residual height, and other estimated parameters, as shown in our experiments in
Figure 7 and
Figure 8. This can be used for better selection of a temporal coherence threshold suitable for a specific dataset and an estimated model. Furthermore, a more standardized testing of different parameters and models can be implemented within a multi-parameter analysis. Such an analysis could incorporate the range of velocity differences from different parameter settings as shown in
Figure 9 and
Figure 10. Based on such an analysis, error ranges based on parameter differences could be established and be considered in the interpretation of the results.
On this technical note, we want to raise awareness and open a discussion, especially with respect to reproducibility and more insightful analyses of the results. Reproducibility requires openness of all parameter settings. As small parameters are having a relatively large effect, we deem this necessary. Good scientific practice should also include decision guidelines that lead to the selection of a specific set of parameters. Why was this method used for the processing? Why this model? Why these parameters?
Furthermore, we believe that this technical note can be a starting point for developing a solution. Analyzing the effects of parameters is important so that we can find an automated way to select the best parameters. It is possible to process scenes with multiple parameters and then move toward automated model selection.
Suitable for a technical note, the issue discussed here is rather small. The specific parameters selected are rather insignificant, which is why we selected them to demonstrate the effect of seemingly unimportant parameters. The parameter range is only relevant to solvers working within a parameter range, such as the periodogram, but not for other approaches, such as the LAMBDA approach in STUN [
5] or approaches based on spatial instead of temporal unwrapping, such as StaMPS [
4]. However, this does not mean that the results are not relevant to other methods. Other ‘
silent’ parameters can also have a significant effect on the results of these methods.
We demonstrated the effect of a small set of parameter settings that should only have a minimal effect. If the actual deformation and residual heights are within these parameters, changing the parameter range should have minimal or no effect. However, many different parameters and approaches have been proposed. Different ways of forming networks of PSC points include models that include temporal deformation, spatial unwrapping models, and models including nonlinear deformation. These differences were significantly larger than those shown here. With more changes, the differences between the methods, models, and parameters become even larger. These differences can far exceed any reported error range, emphasizing the need to discuss these differences and fully disclose models and parameters in publications to ensure reproducibility.