**Stochastic Properties of Confidence Ellipsoids after Least Squares Adjustment, Derived from GUM Analysis and Monte Carlo Simulations**

**Wolfgang Niemeier 1,\* and Dieter Tengen <sup>2</sup>**


Received: 30 June 2020; Accepted: 3 August 2020; Published: 8 August 2020

**Abstract:** In this paper stochastic properties are discussed for the final results of the application of an innovative approach for uncertainty assessment for network computations, which can be characterized as two-step approach: As the first step, raw measuring data and all possible influencing factors were analyzed, applying uncertainty modeling in accordance with GUM (Guide to the Expression of Uncertainty in Measurement). As the second step, Monte Carlo (MC) simulations were set up for the complete processing chain, i.e., for simulating all input data and performing adjustment computations. The input datasets were generated by pseudo random numbers and pre-set probability distribution functions were considered for all these variables. The main extensions here are related to an analysis of the stochastic properties of the final results, which are point clouds for station coordinates. According to Cramer's central limit theorem and Hagen's elementary error theory, there are some justifications for why these coordinate variations follow a normal distribution. The applied statistical tests on the normal distribution confirmed this assumption. This result allows us to derive confidence ellipsoids out of these point clouds and to continue with our quality assessment and more detailed analysis of the results, similar to the procedures well-known in classical network theory. This approach and the check on normal distribution is applied to the local tie network of Metsähovi, Finland, where terrestrial geodetic observations are combined with Global Navigation Satellite System (GNSS) data.

**Keywords:** GUM analysis; geodetic network adjustment; stochastic properties; random number generator; Monte Carlo simulation

#### **1. Introduction**

For decades, the quality concepts in geodesy have been based on classical statistical theory and generally accepted assumptions, such as the normal distribution of observations, possible correlation between observations and law of variance propagation. For the here discussed least squares adjustment, the variance–covariance matrix for the unknowns is considered to be the best representation for quality of results.

These considerations are the basis for standard quality measures for precision, such as standard deviation, mean square error, error or confidence ellipses and prerequisites for the derivation of reliability measures, as well as for more detailed methods such as congruency analysis.

With the advent of GUM, i.e., the "Guide to the Expression of Uncertainty in Measurement", see [1–3], which has found wide acceptance within the community of measuring experts in natural sciences, physics and mechanical engineering, we may ask whether or not the traditional concepts

for quality assessment for geodetic adjustment results are still valid or rather should be replaced by GUM-related new measures.

In this paper, we will participate in this discussion and will study the statistical properties of adjustment results, presenting a new approach in which the variations of the network adjustment results are derived by Monte Carlo simulations, where the quality variability of the input observations is computed in a rigorous procedure based on the rules of GUM.

#### **2. Quality Assessment in Classical Geodetic Adjustment**

#### *2.1. Functional Model*

Within the established methodology (see e.g., [4,5]), quality assessment in geodetic network adjustment is based on the analysis of the covariance matrix Σ*xx* of the final adjusted coordinates *x*. In most cases, the starting point for the adjustment process is the Gauss–Markov (GM) model, given by the functional model.

$$d\_{(n,1)} + v\_{(n,1)} = A\_{(n,u)} \text{ if}\_{(u,1)}.\tag{1}$$

which gives the functional relations between the observations *li* and the unknowns *xj* in a linear/often linearized form. In Equation (1), *l* is the (*n*, 1)—vector of observations *li*, which are in most cases reduced observations after linearization. *A* is the (*n*, *u*)—coefficient or design matrix, known as the Jacobian matrix. The vector *x*(*u*, 1) contains the parameters *xi* in the adjustment problem, where here—without lack of generality—just coordinate unknowns are considered. The (*n*, 1)—vector of residuals *v* accounts for the possible inconsistencies between observations and unknowns.

#### *2.2. Stochastic Model*

The stochastic relations for and between the observations *li* are given by the (*n*, *n*)—covariance matrix Σ*ll* of exactly those *n*. quantities *l*i, that are used as input variables in the adjustment model, but see critical remarks in Section 2.4. According to mathematical statistics, the covariance matrix for these input variables is given by:

$$
\Sigma\_{ll} = \begin{bmatrix}
\sigma\_1^2 & \rho\_{12}\sigma\_1\sigma\_2 & \cdots & \rho\_{1n}\sigma\_1\sigma\_n \\
\rho\_{21}\sigma\_2\sigma\_1 & \sigma\_2^2 & \cdots & \rho\_{2n}\sigma\_2\sigma\_n \\
\vdots & \vdots & \ddots & \vdots \\
\rho\_{n1}\sigma\_n\sigma\_1 & \rho\_{n2}\sigma\_n\sigma\_2 & & \sigma\_n^2
\end{bmatrix} \tag{2}
$$

where the terms σ<sup>2</sup> *<sup>i</sup>* represent the variance estimates for the input variable *li*, and the terms ρ*ij*σ*i*σ*<sup>j</sup>* are the covariances between variables *li* and *lj*. The correlation coefficient ρ*ij* between the input variables *li* and *lj* is rarely known and therefore in most applications the stochastic model is reduced to a diagonal matrix, where correlations are no longer considered.

$$
\Sigma\_{\text{ll}(n,n)} = \begin{bmatrix}
\sigma\_1^2 & & & & \\
& \sigma\_2^2 & & 0 & \\
& & \sigma\_3^2 & & \\
& & 0 & \ddots & \\
& & & & \sigma\_n^2
\end{bmatrix}.
\tag{3}
$$

Some literature exists to estimate correlation coefficients, where serial correlation, external influencing factors or neglected effects are considered to obtain adequate ρ*ij* values. For GNSS observations [5], the application of correlation coefficients is standard practice, at least for 3D network blockwise correlations for (3,3) where coordinates or coordinate differences are considered.

A further step to simplify the stochastic model and the computational effort is the usage of identical values for a priori variances σ<sup>2</sup> *<sup>i</sup>* for each type of observation (e.g., for directions, distances, height differences, coordinate differences). Often these simplifications are justified by the assumed minor influence of correlations on the coordinate estimates themselves.

#### *2.3. Traditional Quality Assessment*

For this common GM approach, the target function for a least squares adjustment is given by the well-known condition:

$$
\Omega\_{\Sigma} = \boldsymbol{\upsilon}^T \boldsymbol{Q}\_{\text{II}}^{-1} \boldsymbol{\upsilon} = \boldsymbol{\upsilon}^T \boldsymbol{P} \boldsymbol{\upsilon},\tag{4}
$$

where the variance–covariance matrix is split up:

$$
\Sigma\_{\text{ll}} = \sigma\_0^2 Q\_{\text{ll}}.\tag{5}
$$

Here *Qll* is called the cofactor matrix of observations and σ<sup>2</sup> <sup>0</sup> is the variance of unit weight, which can be used to carry out an overall test of the adjustment model, see e.g., [4]. For the final results of least squares adjustment, the coordinates (more precise: corrections to the approximate coordinates) are computed by the well-known formula:

$$\pounds = \left(A^T Q\_{\parallel l}^{-1} A\right)^{-1} A^T Q\_{\parallel l}^{-1} l. \tag{6}$$

The only stochastic variable in this equation is the vector of observations *l*; according to the law of variance propagation, the cofactor matrix *Qxx* or the covariance matrix Σ*xx* of the estimated parameters *x* can be derived easily:

$$Q\_{xx} = \left(A^T Q\_{ll}^{-1} A\right)^{-1} \,, \tag{7}$$

$$
\Sigma\_{\text{xx}} = \sigma\_0^2 Q\_{\text{xx}}.\tag{8}
$$

This matrix Σ*xx* contains all the information to estimate quality measures for the coordinates of a network, more precisely estimates for precision of the adjustment parameters. In most cases, the precision of point coordinates is computed and visualized by confidence ellipses for a certain confidence level. For the example of a local tie network in Finland, the 95% confidence ellipses for the final 3D coordinates are depicted in Figure 8 and discussed in Section 5.3.

To estimate quantities for reliability of observations and of coordinates, the cofactor matrix *Qvv* for the residuals *v* has to be computed, which can be done in a straightforward way by applying the law of variance progagation to Equation (1). Even if aspects of reliability will not be discussed in this paper, it should be pointed out, additionally, that reliability measures are dependent on adequate covariance matrices Σ*xx*.

#### *2.4. Critisicm of Traditional Approach*

Due to the modern electronic sensors, for the users it is almost impossible to obtain knowledge on relevant internal measuring processes and the already applied computational steps within the sensors. Therefore it is not sufficient to follow the classical concept to derive dispersion measures out of repeated observations only. As is the case nowadays, making a measurement is often identical with pushing a button, therefore it is obvious that these "observations" do not contain sufficient information on the real variability of the complete measuring data and processing chain. Besides, in general the variable environmental conditions and the ability of the measuring team are not taken into account. For the stochastic properties of observations, one can state that for the standard approach to develop a stochastic model, the following problems ought to be addressed:

	- Experiences (of persons who carry out the observations and computations);
	- Values given by the manufacturer of the instruments;
	- Corrections due to atmospheric conditions (temperature, air pressure, air humidity);
	- Centering to physical mark (horizontal centering, considering the instrument's height);
	- Geometric reduction to coordinate system used;
	- Application of calibration corrections.

It is almost impossible to consider all these influences in the a priori estimates for the variances and covariances in a rigorous way; therefore, it is left to the responsible person for data processing, in which way—if any—he/she includes these factors in the variance–covariance matrix (Equation (2)). With the application of the concepts of GUM—see following sections—one can overcome these shortages.

#### **3. Uncertainty Modeling According to GUM**

#### *3.1. General Idea of GUM*

In contrast to classical error analysis and quality assessment, the concept of this new GUM (Guide to the Expression of Uncertainty in Measurement) can be considered as a radical paradigm change. Within its realization, several new subtasks have to be solved as pre-analysis steps to get a complete uncertainty analysis according to this new concept.

As outlined in the last section, the traditional statistical concept, which derives dispersion measures and correlations out of repeated independent (!) observations, does not cover the complexity of today's measuring processes, see Figure 1.

**Figure 1.** Set-up of a stochastic model within classical approach: external, instrumental and personal influences are considered "implicitly", at least in a subjective way.

Considering the deficiencies within the classical error theory, on initiative of the Bureau International des Poids et Mesures, France, an international group of experts of metrology formed in 1987 to develop a new approach to adequately assess the complete uncertainty budget of different types of measurements. As a result, the "Guide to the Expression of Uncertainty in Measurement" (GUM) was published, which nowadays is the international standard in metrology, see the fundamental publications [1,2]. The GUM allows the computation of uncertainty quantities for all measuring sensors or systems. The resulting uncertainty value is a non-negative parameter characterizing the complete dispersion of a measuring quantity and by this, nowadays, uncertainty values are considered to be the adequate precision parameters.

As described in the fundamental GUM documents, it is necessary to model the complete measuring and processing chain to derive a "final" resulting measuring quantity *Y* from all influencing raw data. It is important that this numerical model includes all, and really all, input quantities *X*1, *X*2, *X*<sup>3</sup> ... , that influence the final measuring result *Y*. As this model contains all the computational steps, including how the resulting quantity *Y* will be changed whenever one input quantity is modified, this basic model for GUM analysis is named carrier of information. In a simplified form, this model can be described as a (often nonlinear) complex function

$$Y = f(X\_1, X\_2, X\_3, \dots, X\_n). \tag{9}$$

The development of this function is one of the most difficult and complex subtasks for deriving the uncertainty of measurements. Deep understanding of the physical and computational processes within the sensor, the performance of the measuring task itself, the data processing and possible external and environmental influences are necessary. To do this, no standard concept is available, just some recommendations can be given, see e.g., [6,7].

With respect to the later discussions here, it should be mentioned that the original GUM is going to derive an uncertainty measure for just one measuring quantity *Y*.

To restrict the contents of this paper, possible variabilities of the measurand—the physical quantity of interest, which is measured—is not considered here; as for this task, detailed physical knowledge of the specific object would be required.

#### *3.2. Type A and Type B Influence Factors*

An uncertainty analysis according to GUM has a probabilistic basis, but also aims to include all available knowledge of the possible factors that may influence the measuring quantity. Consequently, it is most important to set-up the following two types of influence factors, which are characterized as Type A and Type B:

#### **Type A: Dispersion values for measurements**


#### **Type B: Non-statistical e**ff**ects**


The GUM concept allows us to consider classical random effects (Type A) on measuring results, which correspond to established statistical approaches.

However, additionally, GUM allows us to include all relevant additional influence factors (Type B), e.g., external effects (e.g., due to environmental conditions and the observing team) and possibly remaining systematic errors (e.g., uncontrolled residuals from the measuring procedure, undetected instrumental effects). Even approximations used in computational formulas have to be considered, and as such, are considered here.

#### *3.3. Assignment of Adequate Probability Distribution Functions to Variables*

The GUM concept requires the assignation of statistical distribution functions for all these influencing quantities of Type A and Type B, i.e., a specific distribution, its expectation and dispersion. This aspect is depicted in Figure 2.

**Figure 2.** Statistical distribution functions, used within the Guide to the Expression of Uncertainty in Measurement (GUM) approach to model effects of Type A and Type B, from [8]. (**a**) Normal distribution, (**b**) uniform distribution and (**c**) rectangular distribution.

For Type A quantities, the common probability distribution functions with Gaussian or Normal distribution (with parameter expectation μ and variance σ2) are applied, which is depicted in Figure 2a. Here, classical methods for variance estimation can be used, i.e., the statistical analysis of repeated measurements from our own or external experiences, adopt data sheet information, etc.

For the non-statistical influence factors of Type B, which represent external influences and remaining systematic effects as well as insufficient approximations, according to, e.g., [9,10], it is recommended to introduce a probability distribution function in addition. However, the individual assignment of an adequate statistical distribution is a particularly complex task; in general, the statistical concepts of normal, uniform and triangle distribution functions are used, see Figure 2 and examples in Table 1.


**Table 1.** Type A and Type B influencing factors, possible probability distribution functions and variability range for typical geodetic observations, taken from [11].

GNSS, Global Navigation Satellite System.

For each influence factor, a statistical distribution has to be defined with an expected mean and dispersion, i.e., all these quantities have to be pre-selected to serve as starting values for a complete GUM analysis. To be more specific, it is the engineers' task to estimate the variability of the applied temperature correction during the measuring period, to estimate a quantity for the centering quality, to evaluate the correctness of calibration parameters, etc.

#### *3.4. Approach to Perform a GUM Analysis for Geodetic Observations*

For Electronic Distance Measurements (EDMs), a common geodetic measuring technique, the processing steps according to Equation (9) are depicted in Figure 3. Be aware that here, the complete mathematical formulas are not given, just the specific computational steps are outlined.

**Figure 3.** Influence factors for electronic distance measurements: processing steps for the derivation of an input quantity for an adjustment, from [11].

The resulting distance value *Y*, i.e., the numerical measuring quantity after all necessary pre-processing steps, will serve as the input quantity for network adjustment, see Section 4.2. Within the classical approach, it is necessary to assign a dispersion value to this quantity, see Section 2.3, but here, as an alternative, Monte Carlo simulations are applied.

A simplified numerical example for the set-up of Type A and Type B effects is given in Table 1, where the used geodetic measurements can be applied in a local 3D geodetic network. However, each project requires an individual evaluation of these more general reference values; note, for the numerical example in Section 5, we had to make slight changes of these reference values to account for specific measuring conditions.

The here listed influencing factors of Type A and Type B, as well as their corresponding probability distribution functions and domain of variability, do not claim to be complete, as they do not contain additional computational influences related to the reduction to a reference height (which is always required), effects to account for the selected surveying methods, the quality of the personal or the atmospheric and environmental influences, such as bad weather, strong insolation, etc.

The here presented selection of the distribution type and its variability range are solely preliminary steps. At minimum, a GUM analysis of GNSS observations, a much more detailed study of all influencing factors, has to be performed, which is a current project at the Finish Geodetic Institute [12].

The algorithmic complexity of the set-up of Equation (9), i.e., the difficulty to find the relevant carrier of information, makes it necessary to analyze the complete measuring process and all pre-processing steps. This problem can be visualized in a so-called Ishikawa diagram, as given in Figure 4. This frequently applied diagram, see [13], has to be filled out for each specific measurement system, which can be a laborious task, e.g., the actual publication [14].

**Figure 4.** Ishikawa diagram to analyze all the influence factors for a GUM analysis.

#### *3.5. Uncertainty Quantities out of GUM*

To combine all these effects of Type A and Type B within the classical GUM approach, the well-known law of variance propagation is applied, despite the fact that these are effects with different probability distribution functions. In Figure 5, this approach is explained: On the left-hand side, the different Probability Distribution Functions (PDF) are visualized, i.e., normal, uniform and triangle distribution. On the right-hand side, the formula for the law of variance propagation is shown, combining different uncertainties *uxi* of Type A and Type B. Of course, for each influence factor an individual element *uxi* has to be considered.

**Figure 5.** Classical concept to combine Type A and Type B influence parameters within GUM [7].

Numerically therefore, the uncertainty for the final measuring quantity *Y*, see Equation (9), is derived by combining the uncertainties of Type A (*uAi*) and Type B (*uBi*) for all influence factors following the formula:

$$
\mu\_Y = \sqrt{\mathbf{u}\_{A1}^2 + \dots + \mathbf{u}\_{An}^2 + \mathbf{u}\_{B1}^2 + \dots + \mathbf{u}\_{Bn}^2}.\tag{10}
$$

Within GUM an extended and a complete uncertainty is introduced as well, both derived quantities out of *uY*. A discussion of the usefulness of these extensions and their computations is outside the scope of this paper.

#### *3.6. Criticism*

The application of statistical distribution functions to Type B errors and the application of the law of variance propagation to obtain an uncertainty estimate *uY* are critical points within the GUM approach [15]. The assignment of probability distribution functions to the influencing factors is a sensitive step and of course, the application of the law of variance propagation is a practical method, but it allows us to stay with the established statistical methods and perform subsequent computations.

This GUM concept is discussed within recent geodetic literature to some extent, see e.g., [13–17]. However, most of these discussions and critical remarks are limited to an uncertainty assessment for single measurements, not for complex networks or systems.

Taking into account this criticism, in a later extension of the GUM concept [2,3,18], the use of Monte Carlo simulations is recommended to find the final distribution for the quantity *Y*.

We will not follow this concept, but extend the processing model, see Figure 6, to directly study the stochastic properties of the outcome of the least squares adjustment. Due to our knowledge, such a network approach has not been considered yet.

**Figure 6.** GUM concept for creation of "*N*" sets of input data used in Monte Carlo (MC) simulations for least squares network adjustments.

#### **4. Monte Carlo Simulations**

#### *4.1. Basic Idea of MC Simulations*

For decades, Monte Carlo (MC) methods have been developed and in use to solve complex numerical problems in a specific way, i.e., by repeated random experiments, performed on a computer, see e.g., [19]. All MC computations use repeated random sampling for input quantities, process these input data according to the existing algorithms and obtain a variability of numerical results. For typical simulations, the repeat rate of experiments is 1000–100,000 or more, in order to obtain the most probable distribution of the quantities of interest. This allows MC simulations to model phenomena with well-defined variability ranges for input variables.

Nowadays, with modern computers and well-established Random Number Generators (RNG), large samples are easy to generate, see [20,21]. According to the variability of input quantities, pseudorandom sequences are computed, which allows us to evaluate and re-run simulations.

In this paper, MC simulations are applied to perform uncertainty modelling according to GUM, the specific case of geodetic data processing, i.e., network adjustment following a traditional Gauss–Markov (GM) model. The use of MC simulations allows us to include different Type A and Type B influence factors, which is an extension in relation to the classical approach. The approach allows us to combine a detailed GUM analysis of the measurement process with MC simulations in a rigorous way.

The typical pattern of an MC simulation is as follows:


#### *4.2. Concept to Combine MC-Simulations with GUM Analysis*

The scheme for the here proposed approach for an uncertainty assessment within least squares adjustments of geodetic networks by a rigorous combination of MC simulations with GUM analysis is presented in Figure 6. Starting point is an analysis of the complete pre-processing chain for each observation *li* according to GUM, i.e., an analysis of all possible influencing factors of Type A and Type B, according to the important carrier of information formula, see Equation (9). For all these influencing factors, the most probable numerical measuring value is the starting point, often a mean value or a real observation.

As the next step, pseudorandom numbers for all influencing factors for each observation are created, taking into account their most probable value, the selected probability distribution function and the variability domain. There are numerous options for selecting a Random Number Generator (RNG). RNG should have high quality, i.e., provides good approximations of the ideal mathematical system, e.g., has long sequences, shows no gaps in data, fulfils distribution requirements, see [21]. As discussed in [20], the RANLUX (random number generator at highest luxury level) and its recent variant RANLUX++, which are used here, can be considered as representative of such high-quality RNGs.

For each original reading, respectively, for each influence factor, by using this RNG, a random value is created, representing one realization of the real measuring process. By combining these effects in a consecutive way, see the simplified example in Table 2, for each input quantity for network adjustment, such a randomly generated value is gained. With each set of input data, one least squares adjustment is performed, coming up with one set of coordinate estimates as the outcome.


**Table 2.** Derivation of one input quantity for a distance, using random numbers for some influencing effects.

Repeating this complete approach for a preset number of *N* (e.g., 1000 or 10,000) simulations, the final results of a GUM-MC simulation are achieved, i.e., a set of coordinates/unknowns with its variability, which represent the uncertainty of the coordinates according to the used GUM analysis.

As an example, the specific manner, in which the random numbers for distance observations as input quantities for the adjustment are derived, are depicted in Table 2. Starting with a most probable mean value, such as Type B errors, the effects of pillar variations, calibration and weather and the classical Type A errors are considered. In column 2 their stochastic properties are given, which are the basis for the generation of a random number, which results in a specific modification of the distance observation, see column 3.

#### **5. Application to a Local 3D Geodetic Network**

#### *5.1. Test Site "Metsähovi"*

The Metsähovi Fundamental Station belongs to the key infrastructure of Finnish Geospatial Research Institute (FGI). Metsähovi is a basic station for the national reference system and the national permanent GNSS network. This station is a part of the global network of geodetic core stations, used to maintain global terrestrial and celestial reference frames as well as to compute satellite orbits and perform geophysical studies.

Of special interest here is the character of this network to serve as "Local Tie-Vector", see [22], which is defined as a 3D-coordinate difference between the instantaneous phase centers of various space-based geodetic techniques, in this case between VLBI (Very Long Baseline Interferometry), GNSS (Global Navigation Satellite System) and SLR (Satellite Laser Ranging). All these techniques have different instruments on the site and the geometric relations between their phase centers have to be defined with extreme precision.

As depicted in Figure 7, the structure of this network is rather complex, a special difficulty is that the VLBI antenna is located inside a radome. This requires a two-step network with a connection between an outside and inside network, which is the most critical part in the network design, but this will not be discussed here in detail.

**Figure 7.** (**a**) Fundamental station Metsähovi, Finland, WGS84 (60.217301 N, 24.394529 E); with local tie network. (**b**) Network configuration with stations 11, 31 and 180, which define local tie vectors.

As already discussed in [11], the local tie network Metsähovi consists of 31 points, where specific local tie vectors are given by the GNSS stations, 11 and 31, on the one hand side and point 180, which is located within a radome. This station 180 is not the reference point of the VLBI antenna, it is located in its neighborhood. Here, a 3D free network adjustment is performed with the following measurement elements: 149 total station measurements (slope distances, horizontal directions and vertical angles), 48 GNSS baselines and levelled 43 height differences.

#### *5.2. Input Variables for GUM Analysis*

To get a realistic idea of uncertainties within this network, a classical adjustment model, which combines GNSS, total station and levelling measurements, is set up for this Metsähovi network. Then, all (many) influence factors are considered according to a GUM analysis. The Monte Carlo simulation process starts with the generation of pseudo random numbers for the influence factors, resulting in a set of input values for the adjustment. Repeating this MC simulation with 1000 runs gives the here discussed results.

The GNSS uncertainty model is just a rough idea, as in general it is difficult to simulate all influence factors with e.g., orbital errors, remaining atmospheric effects, multipath and near field effects. Colleagues from FGI are working on the problem to develop a more realistic GUM model for GNSS, see [12].

In our approach, the local tie network is simulated with 1000 runs of pre-analysis and least squares adjustment. In each run, a new set of observations is generated and subsequently a new set of station coordinates is computed. A forced centering is assumed, therefore, in each simulation the coordinates may differ only according to possible pillar centering variations.

The local tie vector consists of the outside stations 11 and 31 and station 180 in the radome, see above. The uncertainty estimates for these stations will be considered here in detail.

According to the GUM concept, the following influencing factors were considered. As mentioned in Section 3.4, some changes of the reference values in Table 1 were necessary to account for specific measurement conditions.

5.2.1. Type A: Classical Approach, Standard Deviations

#### **Total station observations**

As standard deviation for modern total stations (e.g., Leica TS30) often values of 0.15 mgon for manual angle measurements and of 0.3 mgon for observations with automatic target recognition are used. With two sets of angle observations, a precision of 0.2 mgon is assumed to be valid:


#### **GNSS Baselines**

In these computations just a rough estimate is used, neglecting correlations between baseline components:


#### **Height di**ff**erences**


5.2.2. Type B: Additional Influences, including Systematic Effects, External Conditions, Insufficient Approximations, etc.

#### **Variation of pillar and centering**


#### **Variation of instrument and target height**


#### **E**ff**ects of calibration (total station instrument)**

Schwarz [14] gives possible standard deviations of calibration parameters for total stations:


#### **E**ff**ect of calibration of GNSS antenna**


#### **E**ff**ects of limited knowledge on atmospheric parameters**


#### *5.3. Final Results of GUM-Analysis and MC Simulations*

We applied to this network the developed approach of detailed GUM analysis and full MC simulation, i.e., starting with a simulation of the original influence factors and then performing an adjustment. In Figures 8 and 9, the resulting point clouds are depicted for coordinates for stations 11, 31 and 180, which form the local tie vectors. As the Metsähovi network is a 3D geodetic network, for simplicity the resulting point clouds are visualized in the X–Y plane and X–Z plane.

**Figure 8.** *Cont.*

**Figure 8.** Point clouds of coordinate variations for stations 11, 31 and 180. **Left:** in X-Y plane. **Right:** in X-Z plane. The elliptical contour lines refer to a confidence level of 95%.

**Figure 9.** Histograms of the variations of (**a**) x- and (**b**) z-coordinates of station 11 after GUM and MC simulation.

This variation of all coordinate components is of special interest here, as the local tie vectors are defined between the GPS Reference stations 11 and 31 and station 180 inside the radome. The original coordinate differences refer to the global cartesian coordinate system.

#### **6. On Stochastic Properties of These Point Clouds**

The focus of this paper is the stochastic properties for the results of such a MC simulation where the input data are randomly variated quantities, following the GUM analysis concept, see the scheme in Figure 6 and the description given in Section 4.2.

For this analysis, we consider the point clouds for final coordinates as information of primary interest, as depicted e.g., in Figures 8 and 9, separated into X-Y and X-Z planes to make it easier to visualize the findings. To study the stochastic properties of these results, two concepts from mathematical statistics can be considered.

#### *6.1. Cramér's Central Limit Theorem*

Already in 1946, the statistician Cramér [23] developed a probabilistic theory, which is well-known as the Central Limit Theorem (CLT). Without going into detail, the general idea of the CLT is, that the properly normalized sum of independent random variables tends towards a normal distribution, when they are added. This theorem is valid, even if it cannot be guaranteed that the original variables are normally distributed. This CLT is a fundamental concept within probability theory because it allows for the application of probabilistic and statistical methods, which work for normal distributions, to many additional problems, which involve different types of probability distributions.

Following Section 4.2, during the GUM analysis a number of influence factors with different distributions are combined, what is of course more than the "properly normalized sums", which are asked for in the CLT. However, the question is whether or not the resulting input variables of the adjustment are sufficiently normal distributed or—considered here—if the resulting point clouds after MC simulations for adjustment using these input datasets tend to have a normal distribution.

#### *6.2. Hagen's Elementary Error Theory*

Within geodesy and astronomy, a theory of elementary errors was already developed by Hagen in 1837 [24]. This concept means that randomly distributed residuals ε, for which we assume the validity of a normal distribution, can be considered to be the sum of a number of *q* very small elementary errors Δ*i*:

$$
\varepsilon = \Delta\_1 + \Delta\_2 + \Delta\_3 + \Delta\_4 + \dots + \Delta\_q. \tag{11}
$$

This equation holds—according to Hagen's discussion—if all elementary errors have similar absolute size and positive and negative values have the same probability.

Of course the conditions for this theory are not really fulfilled by the here considered concept of GUM analysis of influence factors with subsequent MC simulation, but out of this elementary error theory one could ask whether or not the results of the here proposed approach tends to have a normal distribution, as well.

#### *6.3. Test of MC Simulation Results on Normal Distribution*

To analyze the statistical distribution of a sample of data, numerous test statistics are developed. For analyzing the normal distribution, here we used the test statistics of Shapiro–Wilks, Anderson–Darling, Lilliefors (Kolmogorov–Smirnov) and Shapiro–Francia. A detailed description of these test statistics is given in the review article [25]. The principle of all these tests is to compare the distribution of empirical data with the theoretical probability distribution function.

Here, we want to perform one-dimensional tests for the variation of the x-, y- and z-components of the stations 11, 31 and 180 of our reference network Metsähovi. The used coordinates are the results of 1000 MC simulations, where the input data are pre-processed according to the concept of Section 4.2.

The above-mentioned numerical tests for these simulated values accept for all stations the hypothesis of normal distribution. For us, this is an indication that should be allowed in order to consider the results of the here proposed approach of having a normal distribution. This means an uncertainty for station coordinates, derived via GUM analysis and subsequent MC simulation, can be treated for ongoing statistical analysis of results, such as general quality assessment and deformation analysis, in the same way as the results of classical adjustment. Some additional graphical information for these comparisons can be given, as well. In Figure 9, the distribution of the x- and z-components of station 11 are depicted, received from least squares adjustments after GUM and MC simulations, i.e., corresponding the results of Figure 8. It is obvious that the appearance of the components is close to the well-known bell curve of normal distribution. For the stations 31 and 180 these distributions were visualized, as well, with very similar appearances.

The Q–Q (quantile–quantile) plot is a graphical method for comparing two probability distributions by representing their quantities in a plot against each other. If the two distributions being compared are similar, the points in the Q–Q plot will approximately appear in a line.

If the most real points are within the dashed line, here representing a normal distribution, the assumption of a normal distribution is allowed. The confidence level for point-wise confidence

envelope is 0.95. From Figure 10, it can be seen that for station 180 a normal distribution can be accepted. For stations 11 and 31, the assumption of a normal distribution is accepted, as well.

**Figure 10.** Q–Q plots for comparing probability distributions of (**a**) x- and (**b**) y-coordinates of station 180: If real values lie within dashed line, here assumed normal distribution is accepted.

#### **7. Conclusions**

Within the classical approach of network adjustment, see e.g. [4,5], more or less rough and subjective estimates for the dispersion of measurements are introduced. A detailed criticism to this approach is given in Section 2.4. The here described GUM-based approach starts with an analysis of the complete measuring process and computational process, i.e., it considers all influence factors, which are used during the pre-processing steps. By this, the GUM approach can be considered to give more realistic quantities for the precision of the input data of an adjustment of geodetic networks.

Using random numbers to cover the variability of these input data allows us to perform Monte Carlo simulations, which makes it possible to compute the corresponding variability or uncertainty ranges of final coordinate sets.

These variations are analyzed statistically and tend to follow a normal distribution. This allows us to derive confidence ellipsoids out of these point clouds and to continue with classical quality assessment and more detailed analysis of results, as is performed in established network theory.

Of course, these results are dependent on the selection of the statistical distributions and their variability domains during the GUM analysis of the relevant Type A and Type B influence factors. This concept allows a new way of analyzing the effects of all influence factors on the final result, i.e., the form and size of derived ellipsoids. For example, one can analyze the effect of a less precise total station or a better GNNS system the same way one can study the influence of limited knowledge of the atmospheric conditions or a more rigorous centering system. These studies are outside the scope of this paper. Anyway, the here proposed approach allows a straightforward application of the GUM concept to geodetic observations and to geodetic network adjustment. Being optimistic, the here presented concept to derive confidence ellipsoids out of GUM–MC simulations could replace the classical methods for quality assessment in geodetic networks. The GUM approach will lead to uncertainty estimates, which are more realistic for modern sensors and measuring systems and it is about time to adapt these new concepts from metrology within the discipline of geodesy.

**Author Contributions:** Conceptualization, methodology and writing: W.N.; formal analysis, software development, visualization and review: D.T. Both authors have read and agreed to the published version of the manuscript.

**Funding:** Preliminary work for this project is performed within the frame work of the joint research project SIB60 "Surveying" of the European Metrology Research Programme (EMRP). EMRP research is jointly funded by the participating countries within EURAMET and the European Union. The analysis of stochastic properties, presented here, has not received external funding.

**Acknowledgments:** The authors have to thank the company Geotec Geodätische Technologien GmbH, Laatzen, Germany, for allowing us to use a modification of their commercial software package PANDA for carrying out all computations presented here.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## *Article* **Mean Shift versus Variance Inflation Approach for Outlier Detection—A Comparative Study**

#### **Rüdiger Lehmann 1,\*, Michael Lösler <sup>2</sup> and Frank Neitzel <sup>3</sup>**


Received: 8 May 2020; Accepted: 12 June 2020; Published: 17 June 2020

**Abstract:** Outlier detection is one of the most important tasks in the analysis of measured quantities to ensure reliable results. In recent years, a variety of multi-sensor platforms has become available, which allow autonomous and continuous acquisition of large quantities of heterogeneous observations. Because the probability that such data sets contain outliers increases with the quantity of measured values, powerful methods are required to identify contaminated observations. In geodesy, the mean shift model (MS) is one of the most commonly used approaches for outlier detection. In addition to the MS model, there is an alternative approach with the model of variance inflation (VI). In this investigation the VI approach is derived in detail, truly maximizing the likelihood functions and examined for outlier detection of one or multiple outliers. In general, the variance inflation approach is non-linear, even if the null model is linear. Thus, an analytical solution does usually not exist, except in the case of repeated measurements. The test statistic is derived from the likelihood ratio (LR) of the models. The VI approach is compared with the MS model in terms of statistical power, identifiability of actual outliers, and numerical effort. The main purpose of this paper is to examine the performance of both approaches in order to derive recommendations for the practical application of outlier detection.

**Keywords:** mean shift model; variance inflation model; outlierdetection; likelihood ratio test; Monte Carlo integration; data snooping

#### **1. Introduction**

Nowadays, outlier detection in geodetic observations is part of the daily business of modern geodesists. As Rofatto et al. [1] state, we have well established and practicable methods for outlier detection for half a century, which are also implemented in current standard geodetic software. The most important toolbox for outlier detection is the so-called data snooping, which is based on the pioneering work of Baarda [2]. A complete distribution theory of data snooping, also known as DIA (detection, identification, and adaptation) method, was developed by Teunissen [3].

In geodesy, methods for outlier detection can be characterised as statistical model selection problem. A null model is opposed to one or more extended or alternative models. While the null model describes the expected stochastic properties of the data, the alternative models deviate from such a situation in one way or another. For outlier detection, the alternative models relate to the situation, where the data are contaminated by one or more outliers. According to Lehmann [4], an outlier is defined by "an observation that is so probably caused by a gross error that it is better not used or not used as it is".

From a statistical point of view, outliers can be interpreted as a small amount of data that have different stochastic properties than the rest of the data, usually a shift in the mean or an inflation of the variance of their statistical distribution. This situation is described by extra parameters in the functional or stochastic model, such as shifted means or inflated variances. Such an extended model is called an alternative model. Due to the additionally introduced parameters, the discrepancies between the observations and the related results of the model decrease w. r. t. the null model. It has to be decided whether such an improvement of the goodness of fit is statistically significant, which means that the alternative model describes the data better than the null model. This decision can be made by hypothesis testing, information criteria, or many other statistical decision approaches, as shown by Lehmann and Lösler [5,6].

The standard alternative model in geodesy is the mean shift (MS) model, in which the contamination of the observations by gross errors is modelled as a shift in the mean, i.e., by a systematic effect. This approach is described in a large number of articles and textbooks, for example, the contributions by Baarda [2], Teunissen [7], and Kargoll [8]. However, there are other options besides this standard procedure. The contamination may also be modelled as an inflation of the variance of the observations under consideration, i.e., by a random effect. This variance inflation (VI) model is rarely investigated in mathematical statistics or geodesy. Bhar and Gupta [9] propose a solution based on Cook's statistic [10]. Although this statistic was invented for the MS model, it can also be made applicable when the variance is inflated.

Thompson [11] uses the VI model for a single outlier in the framework of the restricted (or residual) maximum likelihood estimation, which is known as REML. In contrast to the true maximum likelihood estimation, REML can produce unbiased estimates of variance and covariance parameters and it causes less computational workload. Thompson [11] proposes that the observation with the largest log-likelihood value can be investigated as a possible outlier. Gumedze et al. [12] take up this development and set up a so-called variance shift outlier model (VSOM). Likelihood ratio (LR) and score test statistics are used to identify the outliers. The authors conclude that VSOM gives an objective compromise between including and omitting an observation, where its status as a correct or erroneous observation cannot be adequately resolved. Gumedze [13] review this approach and work out a one-step LR test, which is a computational simplification of the full-step LR test.

In geodesy, the VI model was introduced by Koch and Kargoll [14]. The estimation of the unknown parameters has been established as an iteratively reweighted least squares adjustment. The expectation maximization (EM) algorithm is used to detect the outliers. It is found that the EM algorithm for the VI model is very sensitive to outliers, due to its adaptive estimation, whereas the EM algorithm for the MS model provides the distinction between outliers and good observations. Koch [15] applies the method to fit a surface in three-dimensional space to the Cartesian coordinates of a point cloud obtained from measurements with a laser scanner.

The main goal of this contribution is a detailed derivation of the VI approach in the framework of outlier detection and compare it with the well-established MS model. The performance of both approaches is to be compared in order to derive recommendations for the practical application of outlier detection. This comprises the following objectives:


4. Comparison of both approaches using the illustrative example of repeated observations, which is worked out in full detail (Section 4).

Section 5 briefly summarises the investigations that were carried out and critically reviews the results. Recommendations for the practical application of outlier detection conclude this paper.

#### **2. Null Model, Mean Shift Model, and Variance Inflation Model**

In mathematical statistics, a hypothesis *H* is a proposed explanation that the probability distribution of the random *n*-vector *y* of observations belongs to a certain parametric family *W* of probability distributions with parameter vector *θ*, e.g., Teunissen [7],

$$H: y \sim \mathcal{W}(\theta), \quad \theta \in \Theta \tag{1}$$

The parameter vector *θ* might assume values from a set Θ of admissible parameter vectors. A model is then simply the formulation of the relationship between observations *y* and parameters *θ* based on *H*. In geodesy, the standard linear model is based on the hypothesis that the observations follow a normal distribution *N*, e.g., Koch [16], Teunissen [7], i.e.,

$$H\_0: y \sim \mathcal{N}(A\mathbf{x}, \Sigma),\tag{2}$$

with *u*-vector of functional parameters *x* and covariance matrix Σ. The latter matrix might contain further stochastic parameters, like a variance factor *σ*2, according to

$$
\Sigma = \sigma^2 Q\_\prime \tag{3}
$$

with *Q* being the known cofactor matrix of *y*. In this case, *θ* is the union of *x* and *σ*2. The covariance matrix Σ might eventually contain more stochastic parameters, known as variance components, cf. Koch ([16], p. 225ff). Matrix *A* is said to be the *n* × *u*-matrix of design. The model that is based on *H*<sup>0</sup> is called the null model.

In outlier detection, we oppose *H*<sup>0</sup> with one or many alternative hypotheses, most often in the form of a mean shift (MS) hypothesis, e.g., Koch [16], Teunissen [7]

$$H\_{\rm MS}: y \sim \mathcal{N}(A\mathbf{x} + \mathbf{C}\nabla\_{\prime}\Sigma), \quad \nabla \neq \mathbf{0},\tag{4}$$

where ∇ is a *m*-vector of additional functional bias parameters and matrix *C* extends the design. In this case, *θ* is extended by ∇. This relationship gives rise to the MS model, where the mean of the observations is shifted from *Ax* to *Ax* + *C*∇ by the effect of gross errors, see Figure 1. *C*∇ can be interpreted as accounting for the systematic effect of gross observation errors superposing the effect of normal random observation errors already taken into account by Σ in (2). The great advantage of *HMS* is that, if the null model is linear or linearized, so is the MS model. The determination of the model parameters is numerically easy and computationally efficient, cf. Lehmann and Lösler [5].

However, there are different possibilities to set up an alternative hypothesis. The most simple one is the variance inflation hypothesis

$$H\_{VI}: y \sim \mathcal{N}(Ax, \Sigma'),\tag{5}$$

where Σ is a different covariance matrix, which consists of inflated variances. Σ can be interpreted as accounting for the joint random effect of normal observation errors in all observations and zero mean gross errors in few outlying observations, see Figure 2.

**Figure 1.** Schematic representation of the MS approach. The null model *H*<sup>0</sup> is depicted by a dashed grey line and the alternative model *HMS* is shown as a solid red line.

The VI model might be considered to be more adequate to describe the outlier situation when the act of falsification of the outlying observations is thought of as being a random event, which might not be exactly reproduced in a virtual repetition of the observations. However, even if the VI model might be more adequate to describe the stochastics of the observations, this does not mean that it is possible to estimate parameters or to detect outliers better than with some less adequate model like MS. This point will be investigated below.

**Figure 2.** Schematic representation of the VI approach. The null model *H*<sup>0</sup> is depicted by a dashed grey line and the alternative model *HV I* is shown as solid red line.

In the following we will only consider the case that *y* are uncorrelated observations, where both Σ and Σ are diagonal matrices, such that the hypotheses read

$$H\_0: y \sim \mathcal{N}(Ax, \sigma\_{1'}^2, \dots, \sigma\_n^2),$$

$$H\_{VI}: y \sim \mathcal{N}(Ax, \tau\_1\sigma\_1^2, \dots, \tau\_m\sigma\_m^2, \sigma\_{m+1}^2, \dots, \sigma\_n^2), \quad \tau\_1 > 1, \dots, \tau\_m > 1. \tag{6b}$$

Here, *HV I* accounts for *m* random zero mean gross errors in the observations *y*1, ... , *ym* modeled by stochastic parameters *τ*1, ... , *τm*. In this case, *θ* is extended by *τ*1, ... , *τm*, which will be called variance inflation factors. Thus, *τ*1,..., *τ<sup>m</sup>* can be interpreted as a special case of extra variance components.

Note that the term variance inflation factors is used differently in multiple linear regression when dealing with multicollinearity, cf. James et al. ([17], p. 101f).

#### **3. Outlier Detection by Hypothesis Tests**

Outlier detection can be characterised as a statistical model selection problem. The null model, which describes the expected stochastic properties of the data, is opposed to one or more alternative models, which deviate from such properties. Usually, the decision, whether the null model is rejected in favour of a proper alternative model, is based on hypothesis testing. Multiple testing, consisting of a sequence of testings with one single alternative hypothesis only, is required if there are many possible alternative models. In this study, we first focus on such one single testing only. In the following subsections, the test statistics in the MS model as well as in the VI model are derived.

For the sake of simplicity, the scope of this contribution is restricted to cases, where all of the estimates to be computed are unique, such that all matrices to be inverted are regular.

#### *3.1. Mean Shift Model*

In the case of no stochastic parameters, i.e., Σ is known, the optimal test statistic *TMS*(*y*) for the test problem *H*<sup>0</sup> in (2) versus *HMS* in (4) is well known and yields, cf. Teunissen ([7], p. 76):

$$\begin{split} T\_{MS}(y) &:= \nabla^T \Sigma^{-1}\_{\nabla} \nabla \\ &= \pounds\_0^T \Sigma^{-1} \mathcal{C} (\mathbb{C}^T \Sigma^{-1} \Sigma^{-1}\_{\mathbb{R}\_0} \Sigma^{-1})^{-1} \mathbb{C}^T \Sigma^{-1} \pounds\_0 \end{split} \tag{7}$$

where <sup>∇</sup><sup>ˆ</sup> and <sup>Σ</sup>∇<sup>ˆ</sup> are the vector of estimated bias parameters in the MS model and its covariance matrix, respectively. Furthermore, *e*ˆ0 and Σ*e*ˆ0 are the vector of estimated residuals *e* := *Ax* − *y* in the null model and its covariance matrix, respectively. This second expression offers the opportunity to perform the test purely based on the estimation in the null model, cf. Teunissen ([7], p. 75), i.e.,

$$\mathfrak{X}\_0 = (A^T \Sigma^{-1} A)^{-1} A^T \Sigma^{-1} \mathfrak{y}\_\prime \tag{8a}$$

$$\hat{e}\_0 = (I - A(A^T \Sigma^{-1} A)^{-1} A^T \Sigma^{-1}) y\_\prime \tag{8b}$$

$$
\Sigma\_{\mathbb{R}\_0} = \Sigma - A \left( A^T \Sigma^{-1} A \right)^{-1} A^T. \tag{8c}
$$

Each estimation is such that the likelihood function of the model is maximized. The hat will indicate maximum likelihood estimates below.

The test statistic (7) follows the central or non-central *χ*<sup>2</sup> distributions

$$T\_{MS}|H\_0 \sim \chi^2(q,0),\tag{9a}$$

$$T\_{\rm MS}|H\_{\rm MS} \sim \chi^2(q,\lambda),\tag{9b}$$

where *<sup>q</sup>* <sup>=</sup> rank(Σ∇<sup>ˆ</sup> ) is the degree of freedom and *<sup>λ</sup>* <sup>=</sup> <sup>∇</sup>*T*Σ−<sup>1</sup> <sup>∇</sup><sup>ˆ</sup> <sup>∇</sup> is called the non-centrality parameter, depending on the true but unknown bias parameters ∇, e.g., (Teunissen [7], p. 77).

The test statistic (7) has the remarkable property of being uniformly the most powerful invariant (UMPI). This means, given a probability of type 1 decision error (rejection of *H*<sup>0</sup> when it is true) *α*, (7)


For the original test problem, no uniformly most powerful (UMP) test exists. For more details see Arnold [18], Kargoll [8].

Lehmann and Voß-Böhme [19] prove that (7) has the property of being a UMP*χ*<sup>2</sup> test, i.e., a UMP test in the class of all tests with test statistic following a *χ*<sup>2</sup> distribution. It can be shown that (7) belongs to the class of likelihood ratio (LR) tests, where the test statistic is equivalent to the ratio

$$\frac{\max L\_0(\mathbf{x})}{\max L\_{\text{MS}}(\mathbf{x}, \nabla)}.\tag{10}$$

Here, *L*<sup>0</sup> and *LMS* denote the likelihood functions of the null and alternative model, respectively, cf. Teunissen ([7], p. 53), Kargoll [8]. (Two test statistics are said to be equivalent, if they always define the same critical region and, therefore, bring about the same decision. A sufficient condition is that one test statistic is a monotone function of the other. In this case, either both or none exceed their critical value referring to the same *α*.) The LR test is very common in statistics for the definition of a test statistic, because only a few very simple test problems permit the construction of a UMP test. A justification for this definition is provided by the famous Neyman-Pearson lemma, cf. Neyman and Pearson [20].

#### *3.2. Variance Inflation Model*

For the test problem (6a) versus (6b), no UMP test exists. Therefore, we also resort to the LR test here. We start by setting up the likelihood functions of the null and alternative model. For the null model (6a), the likelihood function reads

$$L\_0(\mathbf{x}) = (2\pi)^{-\frac{n}{2}} \prod\_{i=1}^{n} \sigma\_i^{-1} \exp\left\{-\frac{\Omega\_0}{2}\right\},\tag{11a}$$

$$
\Omega\_0 := \sum\_{i=1}^n \frac{(y\_i - a\_i x)^2}{\sigma\_i^2},
\tag{11b}
$$

and for the alternative model (6b) the likelihood function is given by

$$L\_{VI}(\mathbf{x}, \tau\_1, \dots, \tau\_m) = (2\pi)^{-\frac{n}{2}} \prod\_{i=1}^{n} \sigma\_i^{-1} \prod\_{i=1}^{m} \tau\_i^{-\frac{1}{2}} \exp\left\{-\frac{\Omega\_{VI}}{2}\right\},\tag{12a}$$

$$
\Omega\_{VI} := \sum\_{i=1}^{m} \frac{(y\_i - a\_i \mathbf{x})^2}{\sigma\_i^2 \tau\_i} + \sum\_{i=m+1}^{n} \frac{(y\_i - a\_i \mathbf{x})^2}{\sigma\_i^2},
\tag{12b}
$$

where *ai*, *i* = 1, . . . , *n* are the row vectors of *A*. According to (10), the likelihood ratio reads

$$\frac{\max L\_0(\mathbf{x})}{\max L\_{VI}(\mathbf{x}, \tau\_1, \dots, \tau\_m)} = \frac{\max \exp \left\{-\frac{\Omega\_0}{2}\right\}}{\max \prod\_{i=1}^m \tau\_i^{-\frac{1}{2}} \exp \left\{-\frac{\Omega\_{VI}}{2}\right\}}. \tag{13}$$

Equivalently, we might use the double negative logarithm of the likelihood ratio as test statistic, because it brings about the same decision as the likelihood ratio itself, i.e.,

$$T\_{VI} := -2\log\frac{\max\exp\left\{-\frac{\Omega\_0}{2}\right\}}{\max\prod\_{i=1}^m \tau\_i^{-\frac{1}{2}} \exp\left\{-\frac{\Omega\_{VI}}{2}\right\}}$$

$$= \min\Omega\_0 - \min\left\{\Omega\_{VI} + \sum\_{i=1}^m \log \tau\_i\right\}.\tag{14}$$

The first minimization result is the well-known least squares solution (8). The second minimization must be performed not only with respect to *x*, but also with respect to the unknown variance inflation factors *τ*1,..., *τm*. The latter yield the necessary conditions

$$\mathfrak{F}\_{i} = \frac{(y\_{i} - a\_{i}\mathfrak{k}\_{VI})^{2}}{\sigma\_{i}^{2}} = \frac{\mathfrak{k}\_{VI,i}^{2}}{\sigma\_{i}^{2}}, \quad i = 1, \ldots, m. \tag{15}$$

This means that *τ*1, ... , *τ<sup>m</sup>* are estimated, such that the first *m* residuals in the VI model *e*ˆ*V I*,*<sup>i</sup>* equal in magnitude their inflated standard deviations *σ<sup>i</sup>* <sup>√</sup>*τ*ˆ*i*, and the subtrahend in (14) is obtained by

$$\begin{split} \min \left\{ \Omega\_{VI} + \sum\_{i=1}^{m} \log \tau\_{i} \right\} &= \min \left\{ m + \sum\_{i=1}^{m} \log \frac{(y\_{i} - a\_{i} \mathbf{x}\_{VI})^{2}}{\sigma\_{i}^{2}} + \sum\_{i=m+1}^{n} \frac{(y\_{i} - a\_{i} \mathbf{x}\_{VI})^{2}}{\sigma\_{i}^{2}} \right\} \\ &= \min \left\{ m + \sum\_{i=1}^{m} \log \frac{\mathbf{c}\_{VI,i}^{2}}{\sigma\_{i}^{2}} + \sum\_{i=m+1}^{n} \frac{\mathbf{c}\_{VI,i}^{2}}{\sigma\_{i}^{2}} \right\}. \end{split} \tag{16}$$

In the latter expression the minimum is to be found only with respect to the free parameter vector *xV I*. This expression differs from min Ω<sup>0</sup> essentially by the logarithm of the first *m* normalized residuals. This means that those summands are down-weighted, whenever the residuals *e*ˆ*V I*,*<sup>i</sup>* are larger in magnitude than their non-inflated standard deviations *σi*. The necessary conditions for *xV I* are obtained by nullifying the first derivatives of (16) and read

$$0 = \sum\_{i=1}^{m} \frac{a\_{ij}}{y\_i - a\_i \pounds\_{VI}} + \sum\_{i=m+1}^{n} \frac{y\_i - a\_i \pounds\_{VI}}{\sigma\_i^2} a\_{ij\prime} \quad j = 1, \ldots, n,\tag{17}$$

where *aij* denotes the *j*-th element of *ai*.

This system of equations can be rewritten as a system of polynomials of degree *m* + 1 in the parameters *x*ˆ*V I*,*i*. In general, the solution for *x*ˆ*V I* must be executed by a numerical procedure. This extra effort is certainly a disadvantage of the VI model.

Another disadvantage is that (14) does not follow a well known probability distribution, which complicates the computation of the critical value, being the quantile of this distribution. Such a computation is best performed by Monte Carlo integration, according to Lehmann [21].

Note that the likelihood function *LV I* in (12) has poles at *τ<sup>i</sup>* = 0, *i* = 1, ... , *m*. These solutions must be excluded from consideration, because they belong to minima of (12) or equivalently to maxima of (16). (Note that log *τ<sup>i</sup>* is dominated by 1/*τ<sup>i</sup>* at *τ<sup>i</sup>* → 0).

A special issue in the VI model is what to do if max *τ*ˆ*<sup>i</sup>* ≤ 1 is found in (15). In this case, the variance is not inflated, such that *H*<sup>0</sup> must not be rejected in favour of *HV I*, see (6b). However, it might happen that, nonetheless, *TV I* in (14) exceeds its critical value, especially if *α* is large. In order to prevent this behaviour, we modify (14) by

$$T\_{VI}(y) := \begin{cases} 0 & \text{if } \max \mathfrak{f}\_i \le 1, \\ \min \Omega\_0 - \min \left\{ \Omega\_{VI} + \sum\_{i=1}^m \log \mathfrak{r}\_i \right\} & \text{otherwise.} \end{cases} \tag{18}$$

If *H*<sup>0</sup> is true, then there is a small probability that max *τ*ˆ*<sup>i</sup>* > 1 and, consequently, *TV I* > 0 arises, i.e.,

$$\Pr(T\_{VI} > 0 | H\_0) =: \mathfrak{a}\_{\max}.\tag{19}$$

We see that a type 1 error cannot be required more probable than this *α*max, i.e., contrary to the MS model, there is an upper limit for the choice of *α*.

Even more a problem is what to do, if min *τ*ˆ*<sup>i</sup>* < 1 < max *τ*ˆ*<sup>i</sup>* is found in (15). Our argument is that, in this case, *H*<sup>0</sup> should be rejected, but possibly not in favour of *HV I* in (6b). A more suitable alternative hypothesis should be found in the framework of a multiple test.

#### **4. Repeated Observations**

There is one case, which permits an analytical treatment, even of the VI model, i.e., when one scalar parameter *x* is observed directly *n* times, such that we obtain *A* = (1, ... , 1)*<sup>T</sup>* =: **1**. By transformation of the observations, also all other models with *u* = 1 can be mapped to this case. For compact notation, we define the weighted means of all observations and of only the last *n* − *m* inlying observations, i.e.,

$$w := \frac{\sum\_{i=1}^{n} y\_i \sigma\_i^{-2}}{\sum\_{i=1}^{n} \sigma\_i^{-2}},\tag{20a}$$

$$\mathcal{W} := \frac{\sum\_{i=m+1}^{n} y\_i \sigma\_i^{-2}}{\sum\_{i=m+1}^{n} \sigma\_i^{-2}}.\tag{20b}$$

By covariance propagation, the related variances of those expressions are obtained, i.e.,

$$
\sigma\_w^2 = \frac{1}{\sum\_{i=1}^n \sigma\_i^{-2}},
\tag{21a}
$$

$$
\sigma\_W^2 = \frac{1}{\sum\_{i=m+1}^n \sigma\_i^{-2}}.\tag{21b}
$$

Having the following useful identities

$$\frac{\omega}{\sigma\_{\text{w}}^{2}} = \frac{\mathcal{W}}{\sigma\_{\text{W}}^{2}} + \sum\_{i=1}^{m} \frac{y\_{i}}{\sigma\_{i}^{2}} \, \text{\,\, \text{\,} \,\tag{22a}$$

$$\frac{1}{\sigma\_{\text{w}}^{2}} = \frac{1}{\sigma\_{\text{W}}^{2}} + \sum\_{i=1}^{\text{m}} \frac{1}{\sigma\_{i}^{2}},\tag{22b}$$

$$\begin{split} \boldsymbol{w} - \boldsymbol{W} &= \sigma\_{\text{w}}^{2} \left( \frac{\boldsymbol{W}}{\sigma\_{\text{W}}^{2}} - \frac{\boldsymbol{W}}{\sigma\_{\text{w}}^{2}} \right) + \sigma\_{\text{w}}^{2} \sum\_{i=1}^{m} \frac{\mathcal{Y}\_{i}}{\sigma\_{i}^{2}} \\ &= \sigma\_{\text{w}}^{2} \sum\_{i=1}^{m} \frac{\mathcal{Y}\_{i} - \boldsymbol{W}}{\sigma\_{i}^{2}}, \end{split} \tag{22c}$$

the estimates in the null model (8) can be expressed as

$$\pounds\_0 = w\_\prime \tag{23a}$$

$$
\dot{e}\_0 = \dot{y} - \mathbf{1}w,\tag{2.3b}
$$

$$
\Sigma\_{\hat{\mathbf{c}}\_0} = \Sigma - \mathbf{1} \mathbf{1}^T \sigma\_{\mathbf{w}}^2 \tag{23c}
$$

and the minimum of the sum of the squared residuals is

$$\min \Omega\_0 = \sum\_{i=1}^n \frac{(y\_i - w)^2}{\sigma\_i^2}. \tag{23d}$$

#### *4.1. Mean Shift Model*

In the MS model, the first *m* observations are falsified by bias parameters ∇1,..., ∇*m*. Matrix

$$\mathbf{C} = \begin{pmatrix} I \\ 0 \end{pmatrix} \tag{24}$$

in (4) is a block matrix of the *m* × *m* identity matrix and a (*n* − *m*) × *m* null matrix. Maximizing the likelihood function yields the estimated parameters and residuals, i.e.,

$$\mathfrak{X}\_{MS} = \mathsf{W}\_{\prime} \tag{25a}$$

$$
\hat{\varepsilon}\_{MS} = \bar{y} - \mathbf{1}\mathcal{W}\_{\prime} \tag{25b}
$$

respectively, as well as the estimated bias parameters and their related covariance matrix, i.e.,

$$
\hat{\nabla} = \begin{pmatrix} y\_1 - \mathcal{W} \\ \vdots \\ y\_m - \mathcal{W} \end{pmatrix} = \begin{pmatrix} \hat{\varepsilon}\_{MS,1} \\ \vdots \\ \hat{\varepsilon}\_{MS,m} \end{pmatrix},
\tag{25c}
$$

$$
\Sigma\_{\nabla} = \begin{pmatrix}
\sigma\_1^2 + \sigma\_W^2 & \sigma\_W^2 & \dots & \sigma\_W^2 \\
\sigma\_W^2 & \sigma\_2^2 + \sigma\_W^2 & \dots & \sigma\_W^2 \\
\vdots & \vdots & \ddots & \vdots \\
\sigma\_W^2 & \sigma\_W^2 & \dots & \sigma\_m^2 + \sigma\_W^2
\end{pmatrix} \tag{25d}$$

respectively. Note that (25d) is obtained by covariance propagation that was applied to (25c). By applying the Sherman—Morrison formula, cf. Sherman and Morrison [22], the inverse matrix of <sup>Σ</sup>∇<sup>ˆ</sup> is obtained,

Σ−<sup>1</sup> <sup>∇</sup><sup>ˆ</sup> <sup>=</sup> ⎡ ⎢ ⎢ ⎢ ⎢ ⎣ ⎛ ⎜⎜⎜⎜⎝ *σ*2 <sup>1</sup> 0 ... 0 0 *σ*<sup>2</sup> <sup>2</sup> ... 0 . . . . . . ... . . . 0 0 ... *σ*<sup>2</sup> *m* ⎞ ⎟⎟⎟⎟⎠ + *σ*<sup>2</sup> *W*11*<sup>T</sup>* ⎤ ⎥ ⎥ ⎥ ⎥ ⎦ −1 = ⎛ ⎜⎜⎜⎜⎝ *σ*−<sup>2</sup> <sup>1</sup> 0 ... 0 0 *σ*−<sup>2</sup> <sup>2</sup> ... 0 . . . . . . ... . . . 0 0 ... *σ*−<sup>2</sup> *<sup>m</sup>* ⎞ ⎟⎟⎟⎟⎠ <sup>−</sup> *<sup>σ</sup>*<sup>2</sup> *W* 1 + *σ*<sup>2</sup> *<sup>W</sup>* <sup>∑</sup>*<sup>m</sup> <sup>i</sup>*=<sup>1</sup> *<sup>σ</sup>*−<sup>2</sup> *i* ⎛ ⎜⎜⎜⎜⎝ *σ*−<sup>2</sup> 1 *σ*−<sup>2</sup> 2 . . . *σ*−<sup>2</sup> *<sup>m</sup>* ⎞ ⎟⎟⎟⎟⎠ ⎛ ⎜⎜⎜⎜⎝ *σ*−<sup>2</sup> 1 *σ*−<sup>2</sup> 2 . . . *σ*−<sup>2</sup> *<sup>m</sup>* ⎞ ⎟⎟⎟⎟⎠ *T* = ⎛ ⎜⎜⎜⎜⎝ *σ*−<sup>2</sup> <sup>1</sup> 0 ... 0 0 *σ*−<sup>2</sup> <sup>2</sup> ... 0 . . . . . . ... . . . 0 0 ... *σ*−<sup>2</sup> *<sup>m</sup>* ⎞ ⎟⎟⎟⎟⎠ − *σ*2 *w* ⎛ ⎜⎜⎜⎜⎝ *σ*−<sup>4</sup> <sup>1</sup> *<sup>σ</sup>*−<sup>2</sup> <sup>1</sup> *<sup>σ</sup>*−<sup>2</sup> <sup>2</sup> ... *<sup>σ</sup>*−<sup>2</sup> <sup>1</sup> *<sup>σ</sup>*−<sup>2</sup> *<sup>m</sup> σ*−<sup>2</sup> <sup>1</sup> *<sup>σ</sup>*−<sup>2</sup> <sup>2</sup> *<sup>σ</sup>*−<sup>4</sup> <sup>2</sup> ... *<sup>σ</sup>*−<sup>2</sup> <sup>2</sup> *<sup>σ</sup>*−<sup>2</sup> *<sup>m</sup>* . . . . . . ... . . . *σ*−<sup>2</sup> <sup>1</sup> *<sup>σ</sup>*−<sup>2</sup> *<sup>m</sup> <sup>σ</sup>*−<sup>2</sup> <sup>2</sup> *<sup>σ</sup>*−<sup>2</sup> *<sup>m</sup>* ... *<sup>σ</sup>*−<sup>4</sup> *<sup>m</sup>* ⎞ ⎟⎟⎟⎟⎠ , (26)

and the test statistic (7) in the MS model becomes

$$T\_{MS}(\boldsymbol{y}) = \sum\_{i=1}^{m} \frac{\boldsymbol{\varrho}\_{MS,i}^{2}}{\sigma\_{i}^{2}} - \sigma\_{w}^{2} \sum\_{i=1}^{m} \sum\_{j=1}^{m} \frac{\varrho\_{MS,j} \varrho\_{MS,j}}{\sigma\_{i}^{2} \sigma\_{j}^{2}}.\tag{27}$$

According to (9), the distributions of the null model and the alternative model are given by

$$T\_{MS}|H\_0 \sim \chi^2(m, 0),\tag{28a}$$

$$T\_{\rm MS} | H\_{\rm MS} \sim \chi^2(m, \lambda), \tag{28b}$$

respectively, where the non-centrality parameter reads

$$
\lambda = \sum\_{i=1}^{m} \frac{\nabla\_i^2}{\sigma\_i^2} - \sigma\_w^2 \sum\_{i=1}^{m} \sum\_{j=1}^{m} \frac{\nabla\_i \nabla\_j}{\sigma\_i^2 \sigma\_j^2}. \tag{29}
$$

For the special cases of *m* = 1 and *m* = 2 extra bias parameters, as well as the case of independent and identically distributed random observation errors, the related test statistics (27) are given by

Case *m* = 1:

$$T\_{MS}(y) = \frac{\delta\_{MS,1}^2}{\sigma\_1^2 + \sigma\_W^2} \, ^\prime \tag{30}$$

Case *m* = 2:

$$T\_{MS}(y) = \mathring{e}\_{MS,1}^2 \frac{\sigma\_1^2 - \sigma\_w^2}{\sigma\_1^4} + \mathring{e}\_{MS,2}^2 \frac{\sigma\_2^2 - \sigma\_w^2}{\sigma\_2^4} - 2\sigma\_w^2 \frac{\mathring{\varrho}\_{MS,1}\mathring{\varrho}\_{MS,2}}{\sigma\_1^2 \sigma\_2^2},\tag{31}$$

Case *σ*<sup>1</sup> = *σ*<sup>2</sup> = ··· = *σ<sup>m</sup>* =: *σ*:

$$T\_{MS}(y) = \frac{1}{\sigma^2} \left( \sum\_{i=1}^{m} \mathfrak{e}\_{MS,i}^2 - \frac{1}{n} \sum\_{i=1}^{m} \sum\_{j=1}^{m} \mathfrak{e}\_{MS,i} \mathfrak{e}\_{MS,j} \right). \tag{32}$$

In the case of *σ<sup>w</sup>* min *σi*, which often arises when *m n*, the test statistic tends to

$$T\_{MS}(y) \to \sum\_{i=1}^{m} \frac{\hat{\epsilon}\_{MS,i}^2}{\sigma\_i^2}. \tag{33}$$

#### *4.2. Variance Inflation Model—General Considerations*

In the VI model, the first *m* observations are falsified by variance inflation factors *τ*1, ... , *τm*. The necessary condition (17) reads

$$0 = \sum\_{i=1}^{m} \frac{1}{y\_i - \hat{\mathbf{x}}\_{VI}} + \sum\_{i=m+1}^{n} \frac{y\_i - \hat{\mathbf{x}}\_{VI}}{\sigma\_i^2}. \tag{34}$$

Using (20b) and (21b), this can be rewritten to

$$
\pounds\_{VI} - \mathcal{W} = \sum\_{i=1}^{m} \frac{\sigma\_W^2}{y\_i - \pounds\_{VI}}.\tag{35}
$$

This solution *x*ˆ*V I* is obtained as the real root of a polynomial of degree *m* + 1, which might have, at most *m* + 1, real solutions. In the model, it is easy to exclude the case that *yi* = *yj*, *i* = *j*, because they are either both outliers or both good observations. They should be merged into one observation. Let us index the observations, as follows: *y*<sup>1</sup> < *y*<sup>2</sup> < ··· < *ym*. We see that


Therefore, (35) has always at least one real solution *x*ˆ*V I* in each interval *yi*−<sup>1</sup> ... *yi*, where one of them must be a maximum of (16), because (16) goes from −∞ up to some maximum and then down again to −∞ in this interval. Besides these *m* − 1 uninteresting solutions, (35) can have no more or two more real solutions, except in rare cases, where it might have one more real solution. If *W* < *y*1, then there are no solutions above *ym*. If *W* > *ym*, then there are no solutions below *y*1.

From these considerations it becomes clear that (35) can have, at most, one solution that is a minimum of (16), see also Figure 3.

The second-order sufficient condition for a strict local minimum of (16) is that the Hessian matrix *H* of (16),

$$H(\mathbf{x}\_{VI}, \mathbf{r}\_1, \dots, \mathbf{r}\_m) = 2 \begin{pmatrix} \frac{1}{\sigma\_W^2} + \sum\_{i=1}^m \frac{1}{\tau\_1 \sigma\_i^2} & \frac{\frac{Y\_1 - \mathbf{x}\_{VI}}{\tau\_1^2 \sigma\_1^2}}{\tau\_1^2 \sigma\_1^2} & \cdots & \frac{\frac{Y\_m - \mathbf{x}\_{VI}}{\tau\_m^2 \sigma\_m^2}}{\tau\_m^2 \sigma\_m^2} \\ \frac{\frac{Y\_1 - \mathbf{x}\_{VI}}{\tau\_1^2 \sigma\_1^2}}{\tau\_1^2 \sigma\_1^2} & \frac{\frac{(y\_1 - \mathbf{x}\_{VI})^2}{\tau\_1^2 \sigma\_1^2} - \frac{1}{2\tau\_1^2}}{\tau\_1^2 \sigma\_1^2} & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ \frac{\frac{y\_m - \mathbf{x}\_{VI}}{\tau\_m^2 \sigma\_m^2}}{\tau\_m^2 \sigma\_m^2} & 0 & \cdots & \frac{(y\_m - \mathbf{x}\_{VI})^2}{\tau\_m^2 \sigma\_m^2} - \frac{1}{2\tau\_m^2} \end{pmatrix},\tag{36}$$

must be positive-definite at *x*ˆ*V I*, *τ*ˆ1,..., *τ*ˆ*m*, cf. Nocedal and Wright ([23] p.16), i.e.,

$$H(\mathbf{\hat{x}}\_{VI}, \mathbf{\hat{x}}\_{1}, \dots, \mathbf{\hat{x}}\_{m}) = 2 \begin{pmatrix} \frac{1}{\sigma\_{W}^{2}} + \sum\_{i=1}^{m} \frac{1}{\frac{1}{\tau\_{i}\sigma\_{I}^{2}}} & \frac{y\_{1} - \mathbf{\hat{x}}\_{VI}}{\frac{1}{\tau\_{1}^{2}\sigma\_{1}^{2}}} & \dots & \frac{y\_{m} - \mathbf{\hat{x}}\_{VI}}{\frac{1}{\tau\_{m}^{2}\sigma\_{m}^{2}}}\\ \frac{y\_{1} - \mathbf{\hat{x}}\_{VII}}{\frac{1}{\tau\_{1}^{2}\sigma\_{1}^{2}}} & \frac{1}{2\tau\_{1}^{2}} & \dots & 0\\ \vdots & \vdots & \ddots & \vdots\\ \frac{y\_{m} - \mathbf{\hat{x}}\_{VI}}{\frac{1}{\tau\_{m}^{2}\sigma\_{m}^{2}}} & 0 & \dots & \frac{1}{2\tau\_{m}^{2}} \end{pmatrix}. \tag{37}$$

must be a positive definite matrix. A practical test for positive definiteness that does not require explicit calculation of the eigenvalues is the principal minor test, also known as Sylvester's criterion. The *k*-th leading principal minor is the determinant that is formed by deleting the last *n* − *k* rows and columns of the matrix. A necessary and sufficient condition that a symmetric *n* × *n* matrix is positive definite is that all *n* leading principal minors are positive, cf. Prussing [24], Gilbert [25]. Invoking Schur's determinant identity, i.e.,

$$\det\begin{pmatrix} A & B \\ \mathbb{C} & D \end{pmatrix} = \det(D)\det(A - BD^{-1}\mathbb{C})\tag{38}$$

and in combination with (15), we see that the *k*-th leading principal minor of *H* in (37) is

$$\prod\_{i=1}^{k} \left(\frac{1}{2\hat{\tau}\_{k}^{2}}\right) \left(\frac{1}{\sigma\_{W}^{2}} + \sum\_{i=1}^{m} \frac{1}{\hat{\tau}\_{i}\sigma\_{i}^{2}} - \sum\_{i=1}^{k} \frac{2}{\hat{\tau}\_{i}\sigma\_{i}^{2}}\right). \tag{39}$$

To be positive, the second factor must be ensured to be positive for each *k*. Obviously, if this is true for *k* = *m*, it is also true for all other *k*. Therefore, the necessary and sufficient condition for a local minimum of (16) reads

$$\sum\_{i=1}^{m} \frac{\sigma\_W^2}{\hat{\tau}\_i \sigma\_i^2} = \sum\_{i=1}^{m} \frac{\sigma\_W^2}{(y\_i - \hat{\pi}\_{VI})^2} < 1. \tag{40}$$

In other words, if and only if *x*ˆ*V I* is sufficiently far away from all outlying observations, it belongs to a strict local minimum of (16).

Using (23d),(16), the test statistic (18) in the VI model for max *τ*ˆ*<sup>i</sup>* > 1 reads

$$\begin{split} T\_{VI} &= \min \Omega\_0 - \min \left\{ \Omega\_{VI} + \sum\_{i=1}^m \log \tau\_i \right\} \\ &= \sum\_{i=1}^n \frac{(y\_i - w)^2}{\sigma\_i^2} - m - \sum\_{i=1}^m \log \frac{(y\_i - \hat{\mathfrak{x}}\_{VI})^2}{\sigma\_i^2} - \sum\_{i=m+1}^n \frac{(y\_i - \hat{\mathfrak{x}}\_{VI})^2}{\sigma\_i^2} \\ &= \sum\_{i=1}^n \left( \frac{(y\_i - w)^2}{\sigma\_i^2} - \frac{(y\_i - \hat{\mathfrak{x}}\_{VI})^2}{\sigma\_i^2} \right) - m + \sum\_{i=1}^m \left( \frac{(y\_i - \hat{\mathfrak{x}}\_{VI})^2}{\sigma\_i^2} - \log \frac{(y\_i - \hat{\mathfrak{x}}\_{VI})^2}{\sigma\_i^2} \right) \\ &= (\hat{\mathfrak{x}}\_{VI} - w) \sum\_{i=1}^n \frac{2y\_i - \hat{\mathfrak{x}}\_{VI} - w}{\sigma\_i^2} - m + \sum\_{i=1}^m \left( \hat{\mathfrak{x}}\_i - \log \hat{\mathfrak{x}}\_i \right) \\ &= -\frac{(\hat{\mathfrak{x}}\_{VI} - w)^2}{\sigma\_w^2} - m + \sum\_{i=1}^m \left( \hat{\mathfrak{x}}\_i - \log \hat{\mathfrak{x}}\_i \right). \end{split}$$

Tracing back the flow sheet of computations, it becomes clear that *TV I* depends on the observations only through *y*1, ... , *ym* and *w* or equivalently through *y*1, ... , *ym* and *W*. These *m* + 1 quantities represent a so-called "sufficient statistic" for the outlier test.

**Figure 3.** Illustration of two exemplary cases of the solution of (35). Whereas the dash-dot red styled lines indicate the right hand side of the function, the black colored dashed line depicts the left hand side of (35). A dark red circle symbolises the desired solution out of five.

An interesting result is obtained, when we consider the case *σ<sup>W</sup>* → 0, which occurs if the *n* − *m* good observations contain much more information than the *m* suspected outliers. In this case, (35) can be rewritten as

$$P(\mathfrak{X}\_{VI} - \mathcal{W}) \prod\_{i=1}^{m} (y\_i - \mathfrak{X}\_{VI}) = \sigma\_W^2 P(y\_{1\prime}, \dots, y\_{n\prime}, \mathfrak{x}\_{VI}). \tag{42}$$

where *P* is some polynomial. If the right hand side goes to zero, at least one factor of the left hand side must also go to zero. As *σ<sup>W</sup>* → 0, we obtain *m* + 1 solutions for *x*ˆ*V I*, approaching *W*, *y*1, ... .*ym*. The first solution can be a valid VI solution, the others are invalid as *τ*ˆ*<sup>i</sup>* → 0. Note that we have *x*ˆ*V I* → *x*ˆ*MS* in this case, and also *e*ˆ*V I*,*<sup>i</sup>* → *e*ˆ*MS*,*<sup>i</sup>* for all *i* = 1, ... , *m*. Having in mind (22c), we see that (41) becomes

$$T\_{VI} \rightarrow -\frac{(\mathcal{W} - w)^2}{\sigma\_w^2} - m + \sum\_{i=1}^m \left(\mathfrak{k}\_i - \log \mathfrak{k}\_i\right) \tag{43}$$

$$\sigma\_{w}^{2} = \sigma\_{w}^{2} \left( \sum\_{i=1}^{m} \frac{y\_{i} - \mathcal{W}}{\sigma\_{i}^{2}} \right)^{2} - m + \sum\_{i=1}^{m} \left( \mathfrak{k}\_{i} - \log \mathfrak{k}\_{i} \right) \tag{44}$$

and also noting that with (22b) we find *σ<sup>w</sup>* → 0, such that

$$T\_{VI} \rightarrow -m + \sum\_{i=1}^{m} \left(\mathfrak{k}\_i - \log \mathfrak{k}\_i\right) \tag{45}$$

$$=-m+\sum\_{i=1}^{m} \frac{\hat{\epsilon}\_{MS,i}^{2}}{\sigma\_{i}^{2}} - \log \frac{\hat{\epsilon}\_{MS,i}^{2}}{\sigma\_{i}^{2}}.\tag{46}$$

When comparing this to the equivalent result in the MS model (33), we see that *TV I* and *TMS* are equivalent test statistics under the sufficient condition min *τ*ˆ*<sup>i</sup>* > 1, because *τ* − log *τ* is a monotonic function for *τ* > 1. This means that, in this case, the decision on *H*<sup>0</sup> is the same, both in the MS and in the VI model. However, max *τ*ˆ*<sup>i</sup>* > 1 might not be sufficient for this property.

#### *4.3. Variance Inflation Model—Test for One Outlier*

In the case *m* = 1 (35) reads

$$\pounds\_{VI} = \frac{\sigma\_W^2}{y\_1 - \pounds\_{VI}} + \mathcal{W} \tag{47}$$

Rewriting this to a quadratic equation yields up to two solutions, i.e.,

$$\pounds\_{VI} = \frac{y\_1 + W}{2} \pm \sqrt{\frac{(y\_1 - W)^2}{4} - \sigma\_W^2} \tag{48}$$

With (15), we find

$$\hat{\sigma}\_1 = \frac{1}{\sigma\_1^2} \left( \frac{y\_1 - W}{2} \pm \sqrt{\frac{(y\_1 - W)^2}{4} - \sigma\_W^2} \right)^2 = \frac{1}{\sigma\_1^2} \left( \frac{\hat{\varepsilon}\_{MS,1}}{2} \pm \sqrt{\frac{\hat{\varepsilon}\_{MS,1}^2}{4} - \sigma\_W^2} \right)^2. \tag{49}$$

For a solution to exist at all, we must have |*e*ˆ*MS*,1| ≥ 2*σW*. This means that *y*<sup>1</sup> must be sufficiently outlying, otherwise *H*<sup>0</sup> is to be accepted.

The condition for a strict local maximum (40) reads here

$$
\sigma\_W^2 < \hat{\tau}\_1 \sigma\_1^2 = \left(\frac{\hat{\varepsilon}\_{MS,1}}{2} \pm \sqrt{\frac{\ell\_{MS,1}^2}{4} - \sigma\_W^2}\right)^2. \tag{50}
$$

For the sign of the square root equal to the sign of *e*ˆ*MS*,1, this inequality is trivially fulfilled. For the opposite sign, we require

$$
\sigma\_W < \frac{|\mathcal{E}\_{MS,1}|}{2} - \sqrt{\frac{\mathcal{E}\_{MS,1}^2}{4} - \sigma\_W^2}. \tag{51}
$$

Rewriting this expression yields

$$
\sqrt{\frac{\hat{\epsilon}\_{MS,1}^2}{4} - \sigma\_W^2} < \frac{|\hat{\varepsilon}\_{MS,1}|}{2} - \sigma\_W \tag{52}
$$

and squaring both sides, which can be done because they are both positive, we readily arrive at |*e*ˆ*MS*,1| < 2*σW*, which is the case that no solution exists. Therefore, we have exactly one minimum of (16), i.e.,

$$\text{fit}\_{VI} = \frac{y\_1 + \mathcal{W}}{2} - \text{sign}(y\_1 - \mathcal{W})\sqrt{\frac{(y\_1 - \mathcal{W})^2}{4} - \sigma\_W^2} \tag{53a}$$

$$=\frac{y\_1+W}{2} - \text{sign}(\ell\_{MS,1})\sqrt{\frac{\ell\_{MS,1}^2}{4}} - \sigma\_W^2$$

$$\hat{\tau}\_1 = \frac{1}{\sigma\_1^2} \left(\frac{\ell\_{MS,1}}{2} + \text{sign}(\ell\_{MS,1})\sqrt{\frac{\ell\_{MS,1}^2}{4} - \sigma\_W^2}\right)^2,\tag{53b}$$

and the test statistic (18) becomes

$$T\_{VI} = \begin{cases} 0 & \text{if } \mathfrak{k}\_1 \le 1, \\ -\frac{(\mathfrak{k}\_{VI} - w)^2}{\sigma\_w^2} - 1 + \mathfrak{k}\_1 - \log \mathfrak{k}\_1 & \text{otherwise.} \end{cases} \tag{54}$$

The condition *τ*ˆ1 > 1 is equivalent to

1

$$\sqrt{\frac{\ell\_{MS,1}^2}{4} - \sigma\_W^2} > \sigma\_1 - \frac{|\mathcal{E}\_{MS,1}|}{2},\tag{55}$$

which is trivially fulfilled, if the right hand side is negative. If it is non-negative, both sides can be squared and rearranged to

$$|\hat{e}\_{MS,1}| > \frac{\sigma\_1^2 + \sigma\_W^2}{\sigma\_1}.\tag{56}$$

Since this condition also covers the case that |*e*ˆ*MS*,1| > 2*σ*1, it can be used exclusively as an equivalent of *τ*ˆ1 > 1.

With (22c) we see that both

$$\begin{split} \pounds\_{VI} - w &= \frac{y\_1 - W}{2} - \text{sign}(\mathfrak{e}\_{MS,1}) \sqrt{\frac{\hat{\varepsilon}\_{MS,1}^2}{4} - \sigma\_W^2} - \sigma\_w^2 \frac{y\_1 - W}{\sigma\_1^2} \\ &= \pounds\_{MS,1} \left( \frac{1}{2} - \frac{\sigma\_w^2}{\sigma\_1^2} \right) - \text{sign}(\mathfrak{e}\_{MS,1}) \sqrt{\frac{\hat{\varepsilon}\_{MS,1}^2}{4} - \sigma\_W^2} \end{split} \tag{57}$$

as well as *τ*ˆ1 through (53b) depend on the observations only through *e*ˆ*MS*,1, and so does *TV I* in (54). On closer examination, we see that *TV I* in (54) depends even only on |*e*ˆ*MS*,1|. This clearly holds as well for *TMS* in (30). Therefore, both test statistics are equivalent if *TV I* can be shown to be a strictly monotone function of *TMS*.

Figure 4 shows that (54) as a function of *e*ˆ*MS*,1 is monotone. A mathematical proof of monotony is given in the Appendix A. Thus, it is also monotone as a function of *TMS* and even strictly monotone for *τ*ˆ1 > 1, which is the case that we are interested in. Therefore, the MS model and the VI model are fully equivalent for repeated observations to be tested for *m* = 1 outlier.

**Figure 4.** Resulting test statistic *TV I* of the variance inflation (VI) approach (54) as a function of the residual *e*ˆ*MS*,1 for various ratios of *<sup>σ</sup>w*/*σ*1.

Finally, we numerically determine the probability distribution of test statistic (54) using Monte Carlo integration by


using 10<sup>7</sup> pseudo random samples of *e*ˆ*MS*,1. In Figure 5, the positive branch of the symmetric probability density function (PDF) is given in logarithmic scale for various ratios of *σW*/*σ*1. However, only about 30% of the probability mass is located under this curve, the rest is concentrated at *TV I* = 0, and is not displayed. The quantiles of this distribution determine critical values and also *α*max in (19). The results are summarized in Table 1.

**Figure 5.** Probability density function of *TV I* (54) as a function of the residual *e*ˆ*MS*,1 under *H*0, approximated by MCM. Vertical axis is in logarithmic scale. The strong peaks at 0 for *τ*ˆ1 ≤ 1 are not displayed. The dashed black and the solid red curve relates to *σW*/*σ*<sup>1</sup> = 0.05 and *σW*/*σ*<sup>1</sup> = 0.5, respectively.

**Table 1.** Maximum selectable type 1 error probability *α*max and critical value *c<sup>α</sup>* for *TV I* in (54) for various ratios of *σW*/*σ*<sup>1</sup> and *α* = 0.05.


#### *4.4. Variance Inflation Model—Test for Two Outliers*

In the case *m* = 2 outliers, (35) reads

$$
\pounds\_{VI} - \mathcal{W} = \frac{\sigma\_W^2}{y\_1 - \pounds\_{VI}} + \frac{\sigma\_W^2}{y\_2 - \pounds\_{VI}}.\tag{58}
$$

The solution for *x*ˆ*V I* can be expressed in terms of a cubic equation, which permits an analytical solution. One real solution must be in the interval *y*<sup>1</sup> ... *y*2, but there may be two more solutions


In rare cases, solutions may also coincide. The analytical expressions are very complicated and they do not permit a treatment analogous to the preceding subsection. Therefore, we have to fully rely on numerical methods, which, in our case, is the Monte Carlo method (MCM).

First, we compute the critical values of the test statistic (18)

$$T\_{VI} = \begin{cases} 0 & \text{if } \max(\mathfrak{f}\_1, \mathfrak{f}\_2) \le 1, \\ -\frac{(\mathfrak{f}\_{VI} - w)^2}{\sigma\_w^2} - 2 + \mathfrak{f}\_1 + \mathfrak{f}\_2 - \log \mathfrak{f}\_1 - \log \mathfrak{f}\_2 & \text{otherwise.} \end{cases} \tag{59}$$

The MCM is preformed, using 10<sup>7</sup> pseudo random samples. We restrict ourselves to the case *σ*<sup>1</sup> = *σ*2. The maximum selectable type 1 error probabilities *α*max are summarized in Table 2. It is shown that *α*max is mostly larger than for *m* = 1. The reason is that, more often, we obtain *τ*ˆ*<sup>i</sup>* > 1, even if *H*<sup>0</sup> is true, which makes it easier to define critical values in a meaningful way. Moreover, Table 2 indicates the probabilities that under *H*<sup>0</sup>


i.e., none, one, or both variances are inflated. It is shown that, if the good observations contain the majority of the information, a minimum exists, but, contrary to our expectation, the case max(*τ*ˆ1, *τ*ˆ2) ≤ 1 is not typically the dominating case.

The important result is what happens, if *H*<sup>0</sup> is false, because variances are truly inflated. The probability that *H*<sup>0</sup> is rejected is known as the power 1 − *β* of the test, where *β* is the probability of a type 2 decision error. It is computed both with *TMS* in (31) as well as with *TV I* in (59). Table 3 provides the results. It is shown that the power of *TMS* is always better than of *TV I*. This is unexpected, because *TMS* is not equivalent to the likelihood ratio of the VI model.

A possible explanation of the low performance of *TV I* in (59) is that, in many cases, the likelihood function *LV I* has no local maximum, such that (16) has no local minimum. Even for an extreme variance inflation of *τ*<sup>1</sup> = *τ*<sup>2</sup> = 5 this occurs with remarkable probability of 0.14. Moreover, the probability that max(*τ*ˆ1, *τ*ˆ2) ≤ 1 is hardly less than that. In both cases, *H*<sup>0</sup> cannot be rejected.


**Table 2.** Maximum selectable type 1 error probability *α*max and critical value *c<sup>α</sup>* for *TV I* in (59) for various ratios of *σW*/*σ*<sup>1</sup> = *σW*/*σ*<sup>2</sup> and *α* = 0.05 as well as probabilities that (16) has no local minimum or that 0 or 1 or 2 variances are inflated.

**Table 3.** Test power 1 − *βMS* for test using *TMS* in (31) and test power 1 − *βV I* for test using *TV I* in (59) for various true values of the variance inflation factors *τ*1, *τ*<sup>2</sup> for *σ<sup>W</sup>* = 0.1 · *σ*<sup>1</sup> = 0.1 · *σ*<sup>2</sup> and *α* = 0.05, as well as probabilities that (16) has no local minimum or that 0 or one or two variances are inflated.


#### *4.5. Outlier Identification*

If it is not known which observations are outlier-suspected, a multiple test must be set up. If the critical values are identical in all tests, then we simply have to look for the largest test statistic. This is the case for *TMS* when considering the same number *m* of outlier-suspected observations, see (9). If we even consider different numbers *m* in the same multiple test, we have to apply the *p*-value approach, cf. Lehmann and Lösler [5].

In the VI model, the requirement of identical critical values is rarely met. It is, in general, not met for repeated observations, not even for *m* = 1, as can be seen in Figure 5. However, in this case, it is no problem, because the test with *TV I* in (54) is equivalent to *TMS* in (30), as demonstrated. This also means that the same outlier is identified with both test statistics.

For repeated observations, we find identical critical values only for identical variances of the outlier-suspected observations, such that those observations are fully indistinguishable from each other. For example, for *n* = 27, *m* = 2, *α* = 0.05, and *σ*<sup>1</sup> = ··· = *σn*, we find *<sup>σ</sup>W*/*σ<sup>i</sup>* = 0.20 and *c<sup>α</sup>* = 2.32 for all 351 pairs of outlier-suspected observations, see Table 2.

We evaluate the identifiability of two outliers in *n* = 10 and *n* = 20 repeated observations with *m* = 2 outliers while using the MCM. In each of the 10<sup>6</sup> repetitions, random observations are generated having equal variances. Two cases are considered. Whereas, in the first case, two variances are inflated by *τ* according to the VI model, in the second case, two observation values are shifted by ∇ according to the MS model. Using (31) and (59), the test statistics *TMS* and *TV I* are computed for all *n*(*n* − 1)/2 = 45 or 190 pairs of observations. If the maximum of the test statistic is attained for the actually modified pair of observations, the test statistic correctly identifies the outliers. Here, we assume that *α* is large enough for the critical value to be exceeded, but otherwise the results are independent of the choice of *α*. The success probabilities are given in Table 4.


**Table 4.** Success probabilities for outlier identification in repeated observations of equal variance *σ*<sup>2</sup> with two outliers.

As expected, the success probabilities increase as *τ* or ∇ gets large. However, for both cases, *TMS* outperforms *TV I*. In Figure 6, the ratio *rT* of the success probabilities between the VI and the MS approach is depicted, having *n* = 10 repeated observations. If *rT* > 1, the success rate of *TV I* is higher than for *TMS* and vice versa. The ratio is always *rT* < 1 and tends to 1, as shown in Figure 6. Therefore, the success probability of the MS is higher than for the VI approach, even if the outliers are caused by an inflation of the variances.

**Figure 6.** Ratio *rT* of the success probabilities between the VI and the MS approach using *n* = 10 repeated observations. The top figure depicts *rT* for the case of inflated variances. The bottom figure depicts *rT* for the case of shifted mean values.

#### **5. Conclusions**

We have studied the detection of outliers in the framework of statistical hypothesis testing. We have investigated two types of alternative hypotheses: the mean shift (MS) hypothesis, where the probability distributions of the outliers are thought of as having a shifted mean, and the variance inflation (VI) model, where they are thought of as having an inflated variance. This corresponds to an outlier-generating process thought of as being deterministic or random, respectively. While the first type of alternative hypothesis is routinely applied in geodesy and in many other disciplines, the second is not. However, even if the VI model might be more adequate to describe the stochastics of the observations, this does not mean that it is possible to estimate parameters or detect outliers better than with some less adequate model, like MS.

The test statistic has been derived by the likelihood ratio of null and alternative hypothesis. This was motivated by the famous Neyman–Pearson lemma, cf. Neyman and Pearson [20], even though that this lemma does not directly apply to this test. Therefore, the performance of the test must be numerically evaluated.

When compared to existing VI approaches, we


We newly found out that the VI stochastic model has some critical disadvantages:


For the first time, the application of the VI model has been investigated for the most simple model of repeated observations. It is shown that here the likelihood function admits at most one local maximum, and it does so, if the outliers are strong enough. Moreover, in the limiting case that the suspected outliers represent an almost negligible amount of information, the VI test statistic and the MS test statistic have been demonstrated to be almost equivalent.

For *m* = 1 outlier in the repeated observations, there is even a closed formula (54) for the test statistic, and the existence and uniqueness of a local maximum is equivalent to a simple checkable inequality condition. Additionally, here the VI test statistic and the MS test statistic are equivalent.

In our numerical investigations, we newly found out that for *m* > 1 outliers in the repeated observations the power of the VI test is worse than using the classical MS test statistic. The reason is the lack of a maximum of the likelihood function, even for sizable outliers. Our numerical investigations also show that the identifiability of the outliers is worse for the VI test statistic. This is clearly seen in the case that the outliers are truly caused by shifted means, but also in the other case the identifiability is slightly worse. This means that the correct outliers are more often identified with the MS test statistic.

In the considered cases, we did not find real advantages of the VI model, but this does not prove that they do not exist. As long as such cases are not found, we therefore recommend practically performing outlier detection by the MS model.

**Author Contributions:** Conceptualization, R.L.; Methodology, R.L.; Software, R.L. and M.L.; Validation, M.L.; Investigation, R.L. and M.L.; Writing—Original Draft Preparation, R.L. and M.L.; Writing—Review & Editing, R.L., M.L., F.N.; Visualization, M.L.; Supervision, F.N. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Appendix A**

To prove the monotony of the tails of the function *TV I* (*eMS*,1) in (54), the extreme values of *TV I* have to be determined. Since *TV I* is symmetric, it is sufficient to restrict the proof for positive values of *eMS*,1. The first derivation of *TV I* is given by

$$T\_{VI}^{'}\left(e\_{MS,1}\right) = -\frac{4\sigma\_W^2\left(\sigma\_1^2 + 1\right) - \epsilon\_{MS,1}^2\left(\sigma\_1^2 + \sigma\_W^2\right) + \epsilon\_{MS,1}\left(\sigma\_1^2 - \sigma\_W^2\right)\sqrt{\epsilon\_{MS,1}^2 - 4\sigma\_W^2}}{\sigma\_W^2\left(\sigma\_1^2 + \sigma\_W^2\right)\sqrt{\epsilon\_{MS,1}^2 - 4\sigma\_W^2}}.\tag{A1}$$

Setting *T V I* (*eMS*,1) = 0 yields two roots

$$
\sigma\_{\pm} = \pm \frac{(\sigma\_1^2 + \sigma\_W^2)}{\sigma\_1}.\tag{A2}
$$

Inserting the positive extreme value *e*+ into the second derivation of *TV I*, i.e.,

$$T''\_{VI}(\varepsilon\_{MS,1}) = \frac{\varepsilon\_{MS,1} \left(\sigma\_1^2 + \sigma\_W^2\right) + \sqrt{\varepsilon\_{MS,1}^2 - 4\sigma\_W^2} \left(\sigma\_W^2 - \sigma\_1^2\right)}{\sigma\_W^2 \left(\sigma\_1^2 + \sigma\_W^2\right) \sqrt{\sigma\_{MS,1}^2 - 4\sigma\_W^2}},\tag{A3}$$

identifies *e*<sup>+</sup> as a minimum value, because *T V I* (*e*+) is always positive for *τ*<sup>1</sup> > 1, cf. (55). For that reason, *TV I* is a monotonically increasing function on the interval (*e*+, +∞). Figure A1 depicts the positive tail of *TV I* and *T V I*, respectively, as well as the minimum *e*+.

**Figure A1.** Tail of the function *TV I* and its first derivation *T V I* for positive values of *eMS*,1 as well as the minimum *e*+.

#### **References**


c 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## *Article* **A Generic Approach to Covariance Function Estimation Using ARMA-Models**

#### **Till Schubert \*, Johannes Korte, Jan Martin Brockmann and Wolf-Dieter Schuh**

Institute of Geodesy and Geoinformation, University of Bonn, 53115 Bonn, Germany;

korte@geod.uni-bonn.de (J.K.); brockmann@geod.uni-bonn.de (J.M.B.); schuh@uni-bonn.de (W.-D.S.)

**\*** Correspondence: schubert@geod.uni-bonn.de

Received: 27 February 2020; Accepted: 11 April 2020; Published: 15 April 2020

**Abstract:** Covariance function modeling is an essential part of stochastic methodology. Many processes in geodetic applications have rather complex, often oscillating covariance functions, where it is difficult to find corresponding analytical functions for modeling. This paper aims to give the methodological foundations for an advanced covariance modeling and elaborates a set of generic base functions which can be used for flexible covariance modeling. In particular, we provide a straightforward procedure and guidelines for a generic approach to the fitting of oscillating covariance functions to an empirical sequence of covariances. The underlying methodology is developed based on the well known properties of autoregressive processes in time series. The surprising simplicity of the proposed covariance model is that it corresponds to a finite sum of covariance functions of second-order Gauss–Markov (SOGM) processes. Furthermore, the great benefit is that the method is automated to a great extent and directly results in the appropriate model. A manual decision for a set of components is not required. Notably, the numerical method can be easily extended to ARMA-processes, which results in the same linear system of equations. Although the underlying mathematical methodology is extensively complex, the results can be obtained from a simple and straightforward numerical method.

**Keywords:** autoregressive processes; ARMA-process; colored noise; continuous process; covariance function; stochastic modeling; time series

#### **1. Introduction and Motivation**

Covariance functions are an important tool to many stochastic methods in various scientific fields. For instance, in the geodetic community, stochastic prediction or filtering is typically based on the collocation or least squares collocation theory. It is closely related to Wiener–Kolmogorov principle [1,2] as well as to the Best Linear Unbiased Predictor (BLUP) and kriging methods (e.g., [3–5]). Collocation is a general theory, which even allows for a change of functional within the prediction or filtering step, simply propagating the covariance to the changed functional (e.g., [6] (p. 66); [4] (pp. 171–173); [7]). In this context, signal covariance modeling is the crucial part that directly controls the quality of the stochastic prediction or filtering (e.g., [8] (Ch. 5)).

Various references on covariance functions are found in geostatistics [9,10] or atmospheric studies [11–14]. The authors in [4,5,8,15–18] cover the topic of covariance functions in the context of various geodetic applications. The discussion of types of covariance functions and investigations on positive definiteness is covered in the literature in e.g., [13,19–21].

In common practice, covariance modelers tend to use simple analytical models (e.g., exponential and Gauss-type) that can be easily adjusted to the first, i.e., to the short range, empirically derived covariances. Even then, as these functions are nonlinear, appropriate fitting methods are not straightforward. On the other hand, for more complex covariance models, visual diagnostics such as half-value width, correlation length, or curvature are difficult to obtain or even undefined for certain (e.g., oscillatory) covariance models. In addition, fitting procedures to any of these are sparse and lack an automated procedure (cf. [17,22]).

Autoregressive processes (AR-processes) define a causal recursive mechanism on equispaced data. Stochastic processes and autoregressive processes are covered in many textbooks in a quite unified parametrization (e.g., [23–30]). In contrast to this, various parametrizations of Autoregressive Moving Average (ARMA) processes exist in stochastic theory [31–35]. A general connection of these topics, i.e., stochastic processes, covariance functions, and the collocation theory in the context of geodetic usage is given by [5].

In geodesy, autoregressive processes are commonly used in the filter approach as a stochastic model within a least squares adjustment. Decorrelation procedures by digital filters derived from the parametrization of stochastic processes are widely used as they are very flexible and efficient for equidistant data (cf. e.g., [36–42]). Especially for highly correlated data, e.g., observations along a satellite's orbit, advanced stochastic models can be described by stochastic processes. This is shown in the context of gravity field determination from observations of the GOCE mission with the time-wise approach, where flexible stochastic models are iteratively estimated from residuals [43–45].

The fact that an AR-process defines an analytical covariance sequence as well (e.g., [23]) is not well established in geostatistics and geodetic covariance modeling. To counteract this, we relate the covariance function associated with this stochastic processes to the frequently used family of covariance functions of second-order Gauss–Markov (SOGM) processes. Expressions for the covariance functions and fitting methods are aligned to mathematical equations of these stochastic process. Especially in collocation theory, a continuous covariance function is necessary to obtain covariances for arbitrary lags and functionals required for the prediction of multivariate data. However, crucially, the covariance function associated with an AR-process is in fact defined as a discrete sequence (e.g., [26]).

Whereas standard procedures which manually assess decaying exponential functions or oscillating behavior by visual inspection can miss relevant components, for instance high-frequent oscillations in the signal, the proposed method is automated and easily expandable to higher order models in order to fit components which are not obvious at first glance. A thorough modeling of a more complex covariance model does not only result in a better fit of the empirical covariances but results in a beneficial knowledge of the signal process itself.

Within this contribution, we propose an alternative method for the modeling of covariance functions based on autoregressive stochastic processes. It is shown that the derived covariance can be evaluated continuously and corresponds to a sum of well known SOGM-models. To derive the proposed modeling procedure and guidelines, the paper is organized as follows. After a general overview of the related work and the required context of stochastic modeling and collocation theory in Section 2, Section 3 summarizes the widely used SOGM-process for covariance function modeling. Basic characteristics are discussed to create the analogy to the stochastic processes, which is the key of the proposed method. Section 4 introduces the stochastic processes with a special focus on the AR-process and presents the important characteristics, especially the underlying covariance sequence in different representations. The discrete sequence of covariances is continuously interpolated by re-interpretation of the covariance sequence as a difference/differential equation, which has a continuous solution. Based on these findings, the proposed representation can be easily extended to more general ARMA-processes as it is discussed in Section 5. Whereas the previous chapters are based on the consistent definition of the covariance sequences of the processes, Section 6 shows how the derived equations and relations can be used to model covariance functions from empirically estimated covariances in a flexible and numerically simple way. In Section 7, the proposed method is applied to a one-dimensional time series which often serves as an example application for covariance modeling. The numerical example highlights the flexibility and the advantage of the generic procedure. This is followed by concluding remarks in Section 8.

#### **2. Least Squares Collocation**

In stochastic theory, a measurement model *L* is commonly seen as a stochastic process which consists of a deterministic trend *Aξ*, a random signal *S*, and a white noise component *N* (cf. e.g., [3] (Ch. 2); [5] (Ch. 3.2))

$$
\mathcal{L} = A\mathfrak{J} + \mathfrak{S} + \mathcal{N} \,. \tag{1}
$$

Whereas the signal part *S* is characterized by a covariance stationary stochastic process, the noise *N* is usually assumed as a simple white noise process with uncorrelated components. Autocovariance functions *<sup>γ</sup>*(|*τ*|) or discrete sequences {*γ*|*j*<sup>|</sup> }Δ*<sup>t</sup>* are models to describe the stochastic behavior of the random variables, i.e., generally *γS*(|*τ*|) and *γN*(|*τ*|), and are required to be positive semi-definite ([28] (Proposition 1.5.1, p. 26)). In case covariances are given as a discrete sequence {*γ*|*j*<sup>|</sup> }Δ*t*, they are defined at discrete lags *<sup>h</sup>* <sup>=</sup> *<sup>j</sup>* <sup>Δ</sup>*<sup>t</sup>* with sampling interval <sup>Δ</sup>*<sup>t</sup>* and *<sup>j</sup>* <sup>∈</sup> <sup>N</sup>0. In general, autocovariance functions or sequences are functions of a non-negative distance, here *τ* or *h* for the continuous and discrete case, respectively. Thus, covariance functions are often denoted by the absolute distance |*τ*| and |*h*|. Here, we introduce the conditions *τ* ≥ 0 and *h* ≥ 0 in order to omit the vertical bars.

The term Least Squares Collocation, introduced in geodesy by [3,6,46], represents the separability problem within the remove–restore technique, where a deterministic estimated trend component *A*'*x* is subtracted from the measurements  and the remainder Δ (*-* = *-* − *A*'*x* is interpreted as a special realization of the stochastic process Δ )*L*. In the trend estimation step

$$\widetilde{\mathbf{x}} = \left(\boldsymbol{A}^T \left(\boldsymbol{\Sigma\_{\mathbf{S}\mathbf{S}}} + \boldsymbol{\Sigma\_{\mathcal{N}\mathcal{N}}}\right)^{-1} \boldsymbol{A}\right)^{-1} \boldsymbol{A}^T \left(\boldsymbol{\Sigma\_{\mathbf{S}\mathbf{S}}} + \boldsymbol{\Sigma\_{\mathcal{N}\mathcal{N}}}\right)^{-1} \boldsymbol{\ell} \,, \tag{2}$$

the optimal parameter vector '*x* is computed from the measurements  as the best linear unbiased estimator (BLUE) for the true trend *Aξ*. The collocation step follows as the best linear unbiased predictor (BLUP) of the stochastic signal '*s* at arbitrary points

$$
\widetilde{\mathbf{s}}^{'} = \boldsymbol{\Sigma}\_{\mathbf{S'}\mathbf{S}} \left( \boldsymbol{\Sigma}\mathbf{s}\mathbf{s} + \boldsymbol{\Sigma}\_{\mathbf{N}\mathbf{N}} \right)^{-1} \boldsymbol{\widehat{\Delta}} \boldsymbol{\widehat{\ell}} \tag{3}
$$

or as a filter process at the measured points

$$
\widetilde{s} = \Sigma\_{\mathbf{SS}} (\Sigma\_{\mathbf{SS}} + \Sigma\_{\mathbf{\mathcal{N}}\mathcal{N}})^{-1} \overline{\Lambda} \overline{\ell} \tag{4}
$$

([3] (Ch. 2); [5] (Ch. 3)). The variance/covariance matrices **Σ***SS* reflect the stochastic behavior of the random signal *S*. The coefficients are derived by the evaluation of the covariance function *γS*(*τ*), where *τ* represents the lags or distances between the corresponding measurement points. **Σ***NN* denotes the variances/covariances of the random noise which is often modeled as independent and identically distributed random process, *γN*(*τ*) = *δ*0,*<sup>τ</sup> σ* <sup>2</sup> *<sup>N</sup>* with *<sup>δ</sup>i*,*<sup>j</sup>* being the Kronecker delta. *<sup>σ</sup>* <sup>2</sup> *<sup>N</sup>* denotes the variance of the noise such that the covariance matrix reads **<sup>Σ</sup>***NN* <sup>=</sup> *<sup>σ</sup>* <sup>2</sup> *<sup>N</sup>* . **Σ***S <sup>S</sup>* is filled row-wise with the covariances of the signal between the prediction points and the measured points.

The true covariance functions *γS*(*τ*) and *γN*(*τ*) are unknown and often have to be estimated, i.e., *gS*(*τ*) and *gN*(*τ*), directly from the trend reduced measurements Δ by an estimation procedure using the empirical covariance sequences {*g*'Δ*<sup>L</sup> <sup>j</sup>* }Δ*t*. The estimated noise variance '*<sup>s</sup>* <sup>2</sup> *<sup>N</sup>* can be derived from the empirical covariances at lag zero

$$
\tilde{\mathbf{g}}\_0^{\Delta \mathcal{L}} = \tilde{\mathbf{g}}\_0^{\mathbf{S}} + \tilde{\mathbf{s}}\_{\mathcal{N}}^2 \,. \tag{5}
$$

Thus, it is allowed to split up *g*'Δ*<sup>L</sup>* <sup>0</sup> into the signal variance *g*'*<sup>S</sup>* <sup>0</sup> given by the covariance function *gS*(0) = *g*'*<sup>S</sup>* <sup>0</sup> and a white noise component '*<sup>s</sup>* <sup>2</sup> *<sup>N</sup>* , known as the nugget effect (e.g., [9] (p. 59)). In theory, '*<sup>s</sup>* <sup>2</sup> *N* can be manually chosen such that the function plausibly decreases from *g*'*S* <sup>0</sup> towards *g*'*<sup>S</sup>* <sup>1</sup> and the higher lags. A more elegant way is to estimate the analytical function *gS*(*τ*) from empirical covariances *g*'*<sup>S</sup> j* with *j*>0 only. Naturally, all estimated functions *gS*(*τ*) must result in *g*'Δ*<sup>L</sup>* <sup>0</sup> − *gS*(0) ≥ 0.

Covariance function modeling is a task in various fields of application. They are used for example to represent a stochastic model of the observations within parameter estimation in e.g., laserscanning [18], GPS [47,48], or gravity field modeling [49–54]. The collocation approach is closely related to Gaussian Process Regression from the machine learning domain [55]. The family of covariance functions presented here can be naturally used as kernel functions in such approaches.

Within these kinds of applications, the covariance functions are typically fitted to empirically derived covariances which follow from post-fit residuals. Within an iterative procedure, the stochastic model can be refined. Furthermore, they are used to characterize the signal characteristics, again e.g., in gravity field estimation or geoid determination in the context of least squares collocation [7] or in atmospheric sciences [11,14].

Reference [8] (Ch. 3) proposed to have a special look to only the three parameters, variance, correlation length, and curvature at the origin to fit a covariance function to the empirical covariance sequence. Reference [17] also suggested taking the sequence of zero crossings into account to find an appropriate set of base functions. In addition, most approaches proceed with the fixing of appropriate base functions for the covariance model as a first step. This step corresponds to the determination of the type or family of the covariance function. The fitting then is restricted to this model and does not generalize well to other and more complex cases. Furthermore, it is common to manually fix certain parameters and optimize only a subset parameters in an adjustment procedure; see e.g., [18].

Once the covariance model is fixed, various optimization procedures exist to derive the model parameters which result in a best fit of the model to a set of empirical covariances. Thus, another aspect of covariance function estimation is the numerical implementation of the estimation procedure. Visual strategies versus least squares versus Maximum Likelihood, point cloud versus representative empirical covariance sequences and non robust versus robust estimators are various implementation possibilities discussed in the literature (see e.g., [11,14,18,47,51,56]). To summarize, the general challenges of covariance function fitting are to find an appropriate set of linear independent base functions, i.e., the type of covariance function, and the nonlinear nature of the set of chosen base functions together with the common problem of outlier detection and finding good initial values for the estimation process.

In particular, geodetic data often exhibit negative correlations or even oscillatory behavior in the covariance functions which leaves a rather limited field of types of covariance functions, e.g., cosine and cardinal sine functions in the one-dimensional or Bessel functions in two-dimensional case (e.g., [27]). One general class of covariance functions with oscillatory behavior is discussed in the next section.

#### **3. The Second-Order Gauss–Markov Process**

#### *3.1. The Covariance Function of the SOGM-Process*

A widely used covariance function is based on the second-order Gauss–Markov processes as given in [23] (Equation (5.2.36)) and [25] (Ch. 4.11, p. 185). The process defines a covariance function of the form

$$\begin{split} \gamma(\tau) &= \frac{\sigma^2}{\cos(\eta)} \mathbf{e}^{-\zeta \omega\_0 \tau} \cos \left( \sqrt{1 - \zeta^2} \,\omega\_0 \,\tau - \eta \right) \\ &= \sigma^2 \,\mathbf{e}^{-\zeta \omega\_0 \tau} \left( \cos \left( \sqrt{1 - \zeta^2} \,\omega\_0 \,\tau \right) + \tan(\eta) \sin \left( \sqrt{1 - \zeta^2} \,\omega\_0 \,\tau \right) \right) \\ & \qquad \qquad \qquad \text{with } 0 < \zeta < 1 \text{ and } \omega\_0 > 0 \,\text{ }. \end{split} \tag{6}$$

Its shape is defined by three parameters. *ω*<sup>0</sup> represents a frequency and *ζ* is related to the attenuation. The phase *η* can be restricted to the domain |*η*| < *π*/2 for logical reasons.

A reparametrization with *<sup>c</sup>* :<sup>=</sup> *ζω*<sup>0</sup> and *<sup>a</sup>* :<sup>=</sup> <sup>1</sup> <sup>−</sup> *<sup>ζ</sup>*<sup>2</sup> *<sup>ω</sup>*<sup>0</sup> gives

$$\begin{split} \gamma(\tau) &= \frac{\sigma^2}{\cos(\eta)} \mathbf{e}^{-\varepsilon \cdot \tau} \cos(a \,\tau - \eta) \\ &= \sigma^2 \mathbf{e}^{-\varepsilon \cdot \tau} \left( \cos(a \,\tau) + \tan(\eta) \sin(a \,\tau) \right) \qquad \text{with} \quad a, c > 0 \end{split} \tag{7}$$

which highlights the shape of a sinusoid. We distinguish between the nominal frequency *a* and the natural frequency *ω*<sup>0</sup> = *a*/ <sup>1</sup> <sup>−</sup> *<sup>ζ</sup>*2. The relative weight of the sine with respect to the cosine term amounts to *w* := tan(*η*). It is noted here that the SOGM-process is uniquely defined by three parameters. Here, we will use *ω*0, *ζ* and *η* as the defining parameters. Of course, the variance *σ*<sup>2</sup> is a parameter as well. However, as it is just a scale and remains independent of the other three, it is not relevant for the theoretical characteristics of the covariance function.

The described covariance function is referenced by various names, e.g., the second-order shaping filter [57], the general damped oscillation curve [27] (Equation (2.116)), and the underdamped second-order continuous-time bandpass filter [58] (p. 270). In fact, the SOGM represents the most general damped oscillating autocorrelation function built from exponential and trigonometric terms. For example, the function finds application in VLBI analysis [59,60].

#### *3.2. Positive Definiteness of the SOGM-Process*

At first glance, it is surprising that a damped sine term is allowed in the definition of the covariance function of the SOGM-process (cf. Equation (6)), as the sine is not positive semi-definite. However, it is shown here that it is in fact a valid covariance function, provided that some conditions on the parameters are fulfilled.

The evaluation concerning the positive semi-definiteness of the second-order Gauss–Markov process can be derived by analyzing the process's Fourier transform as given in [25] (Ch. 4.11, p. 185) and the evaluation of it being non-negative (cf. the Bochner theorem, [61]). The topic is discussed and summarized in [60]. With some natural requirements already enforced by 0 ≤ *ζ* ≤ 1 and *ω*<sup>0</sup> > 0, the condition for positive semi-definiteness (cf. e.g., [57] (Equation (A2))) is

$$|\sin(\eta)| \le \mathcal{Z} \tag{8}$$

which can be expressed by the auxiliary variable *α* := arcsin(*ζ*) as |*η*| ≤ *α*.

In terms of the alternative parameters *a* and *c*, this condition translates to *w* ≤ *c*/*a* (cf. e.g., [27] (Equation (2.117))). As a result, non-positive definite functions as the sine term are allowed in the covariance function only if the relative contribution compared to the corresponding cosine term is small enough.

#### **4. Discrete AR-Processes**

#### *4.1. Definition of the Process*

A more general and more flexible stochastic process is defined by the autoregressive (AR) process. An AR-process is a time series model which relates signal time series values, or more specifically the signal sequence, S*<sup>i</sup>* with autoregressive coefficients *α<sup>k</sup>* as (e.g., [26] (Ch. 3.5.4))

$$\mathcal{S}\_i = \sum\_{k=1}^p a\_k \mathcal{S}\_{i-k} + \mathcal{E}\_i \,. \tag{9}$$

With the transition to *α*<sup>0</sup> := 1, *α*<sup>1</sup> := −*α*1, *α*<sup>2</sup> := −*α*2, etc., the decorrelation relation to white noise E*<sup>i</sup>* is given by

$$\sum\_{k=0}^{p} \overline{\mathfrak{a}}\_{k} \mathfrak{S}\_{i-k} = \mathfrak{E}\_{i} \,. \tag{10}$$

The characteristic polynomial in the factorized form (cf. [26] (Ch. 3.5.4))

$$
\overline{\alpha}\_0 \mathbf{x}^p + \overline{\alpha}\_1 \mathbf{x}^{p-1} + \dots + \overline{\alpha}\_p = \prod\_{k=1}^p (\mathbf{x} - p\_k) \tag{11}
$$

has the roots *pk* <sup>∈</sup> <sup>C</sup>, i.e., the poles of the autoregressive process, which can be complex numbers. This defines a unique transition between coefficients and poles. In common practice, the poles of an AR(*p*)-process only appear as a single real pole or as complex conjugate pairs. Following this, an exemplary process of order *p* = 4 can be composed by either two complex conjugate pairs, or one complex pair and two real poles, or four individual single real poles. An odd order gives at least on real pole. For general use, AR-processes are required to be stationary. This requires that its poles are inside the unit circle, i.e., |*pk*| < 1 (cf. [26] (Ch. 3.5.4)).

AR-processes define an underlying covariance function as well. We will provide it analytically for the AR(2)-process and will summarize a computation strategy for the higher order processes in the following.

#### *4.2. The Covariance Function of the AR(*2*)-Process*

Although AR-processes are defined by discrete covariance sequences, the covariance function can be written as a closed analytic expression, which, evaluated at the discrete lags, gives exactly the discrete covariances. For instance, the AR(2)-process has a covariance function which can be written in analytical form. The variance of AR-processes is mostly parameterized using the variance *σ*<sup>2</sup> <sup>E</sup> of the innovation sequence E*i*. In this paper, however, we use a representation with the autocorrelation function and a variance *σ*<sup>2</sup> as in [26] (Equation (3.5.36), p. 130) and [62] (Section 3.5, p. 504). In addition, the autoregressive parameters *α*<sup>1</sup> and *α*<sup>2</sup> can be converted to the parameters *a* and *c* via *a* = arccos(*α*1/(2 √−*<sup>α</sup>*2)) and *<sup>c</sup>* <sup>=</sup> <sup>−</sup>ln( √−*<sup>α</sup>*2). Hence, the covariance function of the AR(2)-process can be written in a somewhat complicated expression in the variables *a* and *c* as

$$\gamma(\tau) = \sigma^2 \sqrt{\left(\cot(a)\tanh(c)\right)^2 + 1} \text{ e }^{-\varepsilon\tau} \cos(a\ \tau - \arctan(\cot(a)\tanh(c))) \tag{12}$$

using the phase *η* = arctan(cot(*a*) tanh(*c*)) or likewise

$$\gamma(\tau) = \sigma^2 \mathbf{e}^{-\varepsilon \tau} \left( \cos(a \,\tau) + \tanh(c) \cot(a) \,\sin(a \,\tau) \right) \tag{13}$$

with the weight of the sine term *w* = tanh(*c*) cot(*a*).

Please note that in contrast to the SOGM-process the weight or phase in Equations (12) and (13) cannot be set independently, but depends on *a* and *c*. Thus, this model is defined by two parameters only. Therefore, the SOGM-process is the more general model. Caution must be used with respect to the so-called second-order autoregressive covariance model of [11], which is closely related but does not correspond to the standard discrete AR-process.

#### *4.3. AR(p)-Process*

The covariance function of an AR(*p*)-process is given as a discrete series of covariances {*γj*}Δ*<sup>t</sup>* defined at discrete lags *h* = *j* Δ*t* with distance Δ*t*. The Yule–Walker equations (e.g., [24] (Section 3.2); [30] (Equation (11.8)))


directly relate the covariance sequence to the AR(*p*) coefficients. With (14a) being the 0th equation, the next *p* Yule–Walker equations (first to *p*th, i.e., (14b)) are the linear system mostly used for estimation of the autoregressive coefficients. Note that this system qualifies for the use of Levinson–Durbin algorithm because it is Toeplitz structured, cf. [30] (Ch. 11.3) and [63].

The linear system containing only the higher equations (14c) is called the modified Yule–Walker (MYW) equations, cf. [64]. This defines an overdetermined system which can be used for estimating the AR-process parameters where *n* lags are included.

The recursive relation

$$
\gamma\_i - \mathfrak{a}\_1 \gamma\_{i-1} - \mathfrak{a}\_2 \gamma\_{i-2} - \dots - \mathfrak{a}\_p \gamma\_{i-p} = 0,\qquad \text{for} \qquad i = p+1, p+2, \dots \tag{15}
$$

represents a *p*th order linear homogeneous difference equation whose general solution is given by the following equation with respect to a discrete and equidistant time lag *h*=|*t*−*t* | = *j* Δ*t*

$$\gamma\_j = A\_1 p\_1^h + A\_2 p\_2^h + \dots + A\_p p\_p^h \qquad \text{with} \quad j \in \mathbb{N}\_0, \ h \in \mathbb{R}^+, \ A\_k \in \mathbb{C} \tag{16}$$

cf. [23] (Equation (5.2.44)) and [26] (Equation (3.5.44)). *pk* are the poles of the process (cf. Equation (11)) and *Ak* some unknown coefficients.

It has to be noted here that covariances of AR(*p*)-processes are generally only defined at discrete lags. However, it can be mathematically shown that the analytic function of Equation (13) exactly corresponds to the covariance function Equation (6) of the SOGM. In other words, the interpolation of the discrete covariances is done using the same sinusoidal functions as in Equation (6) such that the covariance function of the AR(*p*)-process can equally be written with respect to a continuous time lag *τ*=|*t*−*t* | by

$$\gamma(\tau) = \text{Re}\left(A\_1 p\_1^\tau + A\_2 p\_2^\tau + \dots + A\_p p\_p^\tau\right) = \text{Re}\left(\sum\_{k=1}^p A\_k p\_k^\tau\right) \qquad \text{with} \quad \tau \in \mathbb{R}^+, A\_k \in \mathbb{C} \text{ .}\tag{17}$$

This is also a valid solution of Equation (15) in the sense that *γ*(*h*) = *γ<sup>j</sup>* holds. For one special case of poles, which are negative real poles, the function can be complex valued due to the continuous argument *τ*. Thus, the real part has to be taken for general use.

Now, assuming *Ak* and *pk* to be known, Equation (17) can be used to interpolate the covariance defined by an AR-process for any lag *τ*. Consequently, the covariance definition of an AR-process leads to an analytic covariance function which can be used to interpolate or approximate discrete covariances.

#### 4.3.1. AR(2)-Model

We can investigate the computation for the processes of second order in detail. Exponentiating a complex number mathematically corresponds to

$$p\_k^\pi = |p\_k|^\tau \left(\cos(\arg(p\_k)\,\tau) + i\sin(\arg(p\_k)\,\tau)\right) \,. \tag{18}$$

As complex poles always appear in conjugate pairs, it is plausible that for complex conjugate pairs *pk* = *p* ∗ *<sup>l</sup>* the coefficients *Ak* and *Al* are complex and also conjugate to each other *Ak* = *<sup>A</sup>*<sup>∗</sup> *<sup>l</sup>* . Thus, *Ak p<sup>τ</sup> <sup>k</sup>* <sup>+</sup>*Al <sup>p</sup><sup>τ</sup> <sup>l</sup>* becomes *Ak <sup>p</sup><sup>τ</sup> <sup>k</sup>* <sup>+</sup>*A*<sup>∗</sup> *<sup>k</sup>* (*p* ∗ *<sup>k</sup>* )*<sup>τ</sup>* and the result will be real.

From Equation (13), we can derive that the constants amount to *Ak*,*<sup>l</sup>* <sup>=</sup> *<sup>σ</sup>*<sup>2</sup> <sup>2</sup> (1 ± *i* tanh(*c*) cot(*a*)) = *σ*2 <sup>2</sup> (1 ± *i w*) for the AR(2)-process and from Equation (18) we can see that *c* = −ln(|*pk*|) and *a* = |arg(*pk*)| such that the covariance function can be written as

$$\begin{split} \gamma(\tau) &= \sigma^2 \sqrt{\left(\tanh(\ln(|p\_k|)) \cot(|\arg(p\_k)|) \right)^2 + 1} \\ &|p\_k|^\tau \cos(|\arg(p\_k)|\tau + \arctan(\tanh(\ln(|p\_k|)) \cot(|\arg(p\_k)|))) \\ &= \sigma^2 |p\_k|^\tau \left(\cos(|\arg(p\_k)|\tau) - \tanh(\ln(|p\_k|)) \cot(|\arg(p\_k)|) \sin(|\arg(p\_k)|\tau) \right) . \end{split} \tag{19}$$

It is evident now that the AR(2) covariance model can be expressed as an SOGM covariance function. Whilst the SOGM-process has three independent parameters, here, both damping, frequency, and phase of Equation (19) are determined by only two parameters |*pk*| and |arg(*pk*)| based on <sup>e</sup>−*<sup>c</sup>* = |*pk*|, *<sup>c</sup>* = −ln(|*pk*|), *<sup>a</sup>* = |arg(*pk*)| and *<sup>η</sup>* = arctan(tanh(ln(|*pk*|)) cot(|arg(*pk*)|)). Thus, the SOGM-process is the more general model, whereas the AR(2)-process has a phase *η* or weight *w* that is not independent. From Equation (19), phase *η* can be recovered from the *Ak* by

$$|\eta\_k| = |\arg(A\_k)|\tag{21}$$

and the weight by |*w*| = |Im(*Ak*)/Re(*Ak*)|.

#### 4.3.2. AR(1)-Model

Here, the AR(1)-model appears as a limit case. Exponentiating a positive real pole results in exponentially decaying behavior. Thus, for a single real positive pole, one directly gets the exponential Markov-type AR(1) covariance function, also known in the literature as first-order Gauss–Markov (FOGM), cf. [65] (p. 81). A negative real pole causes discrete covariances of alternating sign. In summary, the AR(1)-process gives the exponentially decaying covariance function for 0 < *pk* < 1

$$\begin{aligned} \gamma(\tau) &= \sigma^2 \exp(-c\tau) \qquad \text{with} \quad c = -\ln(|p\_k|) \\ &= \sigma^2 |p\_k|^\tau \end{aligned} \tag{22}$$

or the exponentially decaying oscillation with Nyquist frequency for −1 < *pk* < 0, cf. [23] (p. 163), i.e.,

$$\begin{split} \gamma(\boldsymbol{\tau}) &= \sigma^2 \exp(-c\,\boldsymbol{\tau}) \cos(\boldsymbol{\pi}\,\boldsymbol{\tau}) \\ &= \sigma^2 |p\_k|^\mathsf{T} \cos(\boldsymbol{\pi}\,\boldsymbol{\tau}) \,. \end{split} \tag{23}$$

#### *4.4. Summary*

From Equation (17), one can set up a linear system


here shown exemplarily for an AR(4)-process. The solution of Equation (24) uniquely determines the constants *Ak* applying standard numerical solvers, assuming the poles to be known from the process coefficients, see Equation (11).

Since Equation (17) is a finite sum over exponentiated poles, the covariance function of a general AR(*p*)-process is a sum of, in case of complex conjugate pairs, AR(2)-processes in the shape of Equation (19) or, in case of real poles, damping terms as given in Equations (22) and (23). The great advantage is that the choice of poles is automatically done by the estimation of the autoregressive process by the YW-Equations (14). Here, we also see that the AR(2)-process as well as both cases of the AR(1)-process can be modeled with Equation (17) such that the proposed approach automatically handles both cases.

Furthermore, we see that Equation (17) adds up the covariance functions of the forms of Equation (19), Equation (22), or Equation (23) for each pole or pair of poles. Any recursive filter can be uniquely dissected into a cascade of second-order recursive filters, described as second-order sections (SOS) or biquadratic filter, cf. [58] (Ch. 11). Correspondingly, the poles of amount *p* can be grouped into complex-conjugate pairs or single real poles. Thus, the higher order model is achieved by concatenation of the single or paired poles into the set of *p* poles (vector *p*) and correspondingly by adding up one SOGM covariance function for each section. Nonetheless, this is automatically done by Equation (17).

#### **5. Generalization to ARMA-Models**

#### *5.1. Covariance Representation of ARMA-Processes*

Thus far, we introduced fitting procedures for the estimation of autoregressive coefficients as well as a linear system of equations to simply parameterize the covariance function of AR(*p*)-processes. In this section, we demonstrate that ARMA-models can be handled with the same linear system and the fitting procedure thus generalizes to ARMA-processes.

For the upcoming part, it is crucial to understand that the exponentiation *p<sup>τ</sup> <sup>k</sup>* of Equation (17) exactly corresponds to the exponentiation defined in the following way:

$$\mathbf{e}^{s\_k \tau} = \mathbf{e}^{\text{Re}(s\_k) \cdot \tau} \left( \cos(\text{Im}(s\_k) \,\, \tau) + i \sin(\text{Im}(s\_k) \,\, \tau) \right) \tag{25}$$

i.e., *p<sup>τ</sup> <sup>k</sup>* <sup>=</sup> <sup>e</sup>*sk <sup>τ</sup>*, if the transition between the poles *pk* to *sk* is done by *sk* <sup>=</sup> ln(*pk*) = (ln(|*pk*|) + *<sup>i</sup>* arg(*pk*)) and *pk* = e*sk* . To be exact, this denotes the transition of the poles from the *z*-domain to the Laplace-domain. This parametrization of the autoregressive poles can, for example, be found in [23] (Equation (5.2.46)), [66] (Equation (A.2)), and [26] (Equation (3.7.58)). In these references, the covariance function of the AR-process is given as a continuous function with respect to the poles *sk* such that the use of Equation (17) as a continuous function is also justified.

In the literature, several parametrizations of the moving average part exist. Here, we analyze the implementation of [33] (Equation (2.15)), where the covariance function of an ARMA-process is given by

$$\gamma(\tau) = \sum\_{k=1}^{p} \frac{b(s\_k)}{a'(s\_k)} \frac{b(-s\_k)}{a(-s\_k)} \mathbf{e}^{s\_k \tau}. \tag{26}$$

Inserting *p<sup>τ</sup> <sup>k</sup>* <sup>=</sup> <sup>e</sup>*sk <sup>τ</sup>* and denoting *Ak* :<sup>=</sup> *<sup>b</sup>*(*sk* ) *<sup>b</sup>*(−*sk* ) *<sup>a</sup>*(*sk* ) *<sup>a</sup>*(−*sk* ) which is independent of *<sup>τ</sup>*, we obtain

$$\gamma(\tau) = \sum\_{k=1}^{p} \frac{b(s\_k)}{a'(s\_k)} \frac{b(-s\_k)}{a(-s\_k)} \, p\_k^{\tau} = \sum\_{k=1}^{p} A\_k \, p\_k^{\tau} \tag{27}$$

which is suitable for the purpose of understanding the parametrization chosen in this paper. Now, the covariance corresponds to the representation of Equation (17). The equation is constructed by a finite sum of complex exponential functions weighted by a term consisting of some polynomials *a*(·), *a* (·) for the AR-part and *b*(·) for the MA-part evaluated at the positions of the positive and negative autoregressive roots *sk*.

It is evident in Equation (26) that *τ* is only linked with the poles. This exponentiation of the poles builds the term which is responsible for the damped oscillating or solely damping behavior. The fraction builds the weighting of these oscillations exactly the same way as the *Ak* in Equation (17). In fact, Equation (17) can undergo a partial fraction decomposition and be represented as in Equation (26). The main conclusion is that ARMA-models can also be realized with Equation (17). The same implication is also gained from the parametrizations by [67] (Equation (3.2)), [31] (Equation (48)), [68] (Equation (9)), and [34] (Equation (4)). It is noted here that the moving average parametrization varies to a great extent in the literature in the sense that very different characteristic equations and zero and pole representations are chosen.

As a result, although the MA-part is extensively more complex than the AR-part and very differently modeled throughout the literature, the MA-parameters solely influence the coefficients *Ak* weighting the exponential terms, which themselves are solely determined by the autoregressive part. This is congruent with the findings of the Equation (19) where frequency and damping of the SOGM-process are encoded into the autoregressive poles *pk*.

#### *5.2. The Numerical Solution for ARMA-Models*

Autoregressive processes of order *p* have the property that *p* + 1 covariances are uniquely given by the process, i.e., by the coefficients and the variance. All higher model-covariances can be recursively computed from the previous ones, cf. Equation (15). This property generally does not hold for empirical covariances, where each covariance typically is an independent estimate. Now, suppose Equation (24) is solved as an overdetermined system by including higher empirical covariances, i.e., covariances that are not recursively defined. The resulting weights *Ak* will automatically correspond to general ARMA-models because the poles *pk* are fixed.

Precisely, the contradiction within the overdetermined system will, to some extent, forcedly end up in the weights *Ak* and thus in some, for the moment unknown, MA-coefficients. The model still is an SOGM process because the number of poles is still two and the SOGM covariance function is the most general damped oscillating function. The two AR-poles uniquely define the two SOGM-parameters frequency *ω*<sup>0</sup> and attenuation *ζ*. The only free parameter to fit an ARMA-model into the shape of the general damped oscillating function (SOGM-process) is the phase *η*. Hence, the MA-part of arbitrary order will only result in a single weight or phase as in Equation (19) and the whole covariance function can be represented by an SOGM-process. Consequently, the *Ak* will be different from that of Equation (20), cf. [29] (p. 60), but the phase can still be recovered from Equation (21).

In summary, the general ARMA(2,*q*)-model (Equation (26)) is also realizable with Equation (17) and thus with the linear system of Equation (24). Here, we repeat the concept of second-order sections. Any ARMA(*p*,*q*)-process can be uniquely dissected into ARMA(2,2)-processes. Thus, our parametrization of linear weights to complex exponentials can realize pure AR(2) and general SOGM-processes, which can be denoted as ARMA(2,*q*)-models. These ARMA(2,*q*)-processes form single SOGM-processes with corresponding parameters *ω*0, *ζ* and *η*. The combination of the ARMA(2,*q*) to the original ARMA(*p*,*q*) process is the operation of addition for the covariance (function), concatenation for the poles, and convolution for the coefficients. Thus, the expansion to higher orders

is similar to the pure AR(*p*) case. The finite sum adds up the covariance function for each second-order section which is an SOGM-process.

The weights would have to undergo a partial-fraction decomposition to give the MA-coefficients. Several references exist for decomposing Equation (26) into the MA-coefficients, known as spectral factorization, e.g., by partial fraction decomposition. In this paper, we stay with the simplicity and elegance of Equation (17).

#### **6. Estimation and Interpolation of the Covariance Series**

Within this section, the theory summarized above is used for covariance modeling, i.e., estimating covariance functions *gS*(*τ*), which can be evaluated for any lag *τ*, from a sequence of given empirical covariances {*g*'Δ*<sup>L</sup> <sup>j</sup>* }Δ*t*. Here, the choice of estimator for the empirical covariances is not discussed and it is left to the user whether to use the biased or the unbiased estimator of the empirical covariances, cf. [23] (p. 174) and [69] (p. 252).

The first step is the estimation of the process coefficients from the *g*'Δ*<sup>L</sup> <sup>j</sup>* with the process order *p* defined by the user. Furthermore, different linear systems have been discussed for this step, cf. Equation (14), also depending on the choice of *n*, which is the index of the highest lag included in Equation (14). These choices already have a significant impact on the goodness of fit of the covariance function to the empirical covariances, as will be discussed later. The resulting AR-coefficients *α<sup>k</sup>* can be directly converted to the poles *pk* using the factorization of the characteristic polynomial (Equation (11)).

For the second step, based on Equation (16), a linear system of *m* equations with *m* ≥ *p*, can be set up, but now for the empirical covariances. Using the first *m* covariances, but ignoring the lag 0 value contaminated by the nugget effect, this results in a system like Equation (24), but now in the empirical covariances *g*'Δ*<sup>L</sup> <sup>j</sup>* = *g*'*<sup>S</sup> <sup>j</sup>* , *j* > 0

$$
\begin{bmatrix}
\widetilde{\mathcal{S}}\_{1}^{\mathcal{S}} \\
\widetilde{\mathcal{S}}\_{2}^{\mathcal{S}} \\
\vdots \\
\widetilde{\mathcal{S}}\_{m-1}^{\mathcal{S}} \\
\widetilde{\mathcal{S}}\_{m}^{\mathcal{S}}
\end{bmatrix} = \begin{bmatrix}
p\_{1}^{1} & p\_{2}^{1} & \dots & p\_{p-1}^{1} & p\_{p}^{1} \\
p\_{1}^{2} & p\_{2}^{2} & \dots & p\_{p-1}^{2} & p\_{p}^{2} \\
\vdots & \vdots & & \vdots & \vdots \\
p\_{1}^{m-1} & p\_{2}^{m-1} & \dots & p\_{p-1}^{m-1} & p\_{p}^{m-1} \\
p\_{1}^{m} & p\_{2}^{m} & \dots & p\_{p-1}^{m} & p\_{p}^{m} \\
\end{bmatrix} \begin{bmatrix}
A\_{1} \\
A\_{2} \\
\vdots \\
A\_{p-1} \\
A\_{p}
\end{bmatrix} \tag{28}
$$

For *m* = *p*, the system can be uniquely solved, resulting in coefficients *Ak* which model the covariance function as an AR(*p*)-process. The case *m* > *p* results in a fitting problem of the covariance model to the *m* empirical covariances *g*'*<sup>j</sup>* with *p* unknowns. This overdetermined system can be solved for instance in the least squares sense to derive estimates *Ak* from the *g*'*j*. As was discussed, these *Ak* signify a process modeling as an ARMA-model. Here, one could use the notation *A*'*<sup>k</sup>* in order to indicate adjusted parameters in contrast to the uniquely determined *Ak* for the pure AR-process. For the sake of a unified notation of Equation (17), it is omitted.

Due to the possible nugget effect, it is advised to exclude *g*'Δ*<sup>L</sup>* <sup>0</sup> and solve Equation (28); however, it can also be included, cf. Equation (24). Moreover, a possible procedure can be to generate a plausible *g*'*S* <sup>0</sup> from a manually determined '*<sup>s</sup>* <sup>2</sup> *<sup>N</sup>* by *g*'*<sup>S</sup>* <sup>0</sup> = *<sup>g</sup>*'Δ*<sup>L</sup>* <sup>0</sup> − '*<sup>s</sup>* <sup>2</sup> *<sup>N</sup>* . Equally, the MYW-Equations are a possibility to circumvent using *g*'Δ*<sup>L</sup>* <sup>0</sup> .

#### *Modeling Guidelines*

The idea of solving the system for the weights *Ak* is outlined in [23] (p. 167). In the following, we summarize some guidelines to estimate the covariance function starting at the level of some residual observation data.

#### **Initial steps:**


#### **Estimation of the autoregressive process:**

	- **–** solving the Yule–Walker equations, i.e., Equation (14b), or
	- **–** solving the modified Yule–Walker equations, i.e., Equation (14c), in the least squares sense using *g*'*S* <sup>1</sup> to *g*'*<sup>S</sup> n* .

#### **Estimation of the weights** *Ak***:**

	- **–** uniquely using *m* = *p* to determine the *Ak*. This results in a pure AR(*p*)-process.
	- **–** or as an overdetermined manner in the least squares sense, i.e., up to *m* > *p*. This results in an underlying ARMA-process.

Here, it also needs to be examined whether the empirical covariances decrease sufficiently towards the high lags. If not, the stationarity of the residuals can be questioned and an enhanced trend reduction might be necessary.

Using the YW-Equations can be advantageous in order to get a unique (well determined) system to be solved for the *αk*. By this, one realizes that the analytic covariance function exactly interpolates the first *p* + 1 covariances, which supports the fact that they are uniquely given by the process, cf. Equation (14b). On the other hand, including higher lags into the process estimation enables long-range dependencies with lower degree models. The same holds for solving the system of Equation (28) for the *Ak* using only *m* = *p* or *m* > *p* covariances. It is left to the user which procedure gives the best fit to the covariances and strongly depends on the characteristics of the data and the application. In summary, there are several possible choices, cf. Table 1.


**Table 1.** Fitting procedures.

Finally, the evaluation of the analytic covariance function (Equation (17)) can be done by multiplying the same linear system using arbitrary, e.g., dense, *τ*:

$$
\begin{bmatrix}
\mathcal{g}^{\mathfrak{S}}(\tau\_{1}) \\
\mathcal{g}^{\mathfrak{S}}(\tau\_{2}) \\
\vdots \\
\mathcal{g}^{\mathfrak{S}}(\tau\_{n})
\end{bmatrix} = 
\begin{bmatrix}
p\_{1}^{\tau\_{1}} & p\_{2}^{\tau\_{1}} & \dots & p\_{p-1}^{\tau\_{1}} & p\_{p}^{\tau\_{1}} \\
p\_{1}^{\tau\_{2}} & p\_{2}^{\tau\_{2}} & \dots & p\_{p-1}^{\tau\_{2}} & p\_{p}^{\tau\_{2}} \\
\vdots & \vdots & & \vdots \\
p\_{1}^{\tau\_{n}} & p\_{2}^{\tau\_{n}} & \dots & p\_{p-1}^{\tau\_{n}} & p\_{p}^{\tau\_{n}} \\
\end{bmatrix}
\begin{bmatrix}
A\_{1} \\
A\_{2} \\
\vdots \\
A\_{p-1} \\
A\_{p}
\end{bmatrix} \tag{29}
$$

Though including complex *pk* and *Ak*, the resulting covariance function values are theoretically real. However, due to limited numeric precision, the complex part might only be numerically zero. Thus, it is advised to eliminate the imaginary part in any case.

#### **7. An Example: Milan Cathedral Deformation Time Series**

This example is the well known deformation time series of Milan Cathedral [62]. The time series measurements are levelling heights of a pillar in the period of 1965 to 1977. It is an equidistant time series having 48 values with sampling interval Δ*t* = 0.25 years. In [62], the time series is deseasonalized and then used for further analysis with autoregressive processes. In this paper, a consistent modeling of the whole signal component without deseasonalization is seen to be advantageous. The time series is detrended using a linear function and the remaining residuals define the stochastic signal, cf. Figure 1.

**Figure 1.** Milan Cathedral time series.

Based on the detrended time series, the biased estimator is used to determine the empirical covariances in all examples. The order for the AR(*p*)-process is chosen as *p* = 4. Four different covariance functions were determined based on the method proposed here. All second-order components of the estimated covariance functions are converted to the SOGM parametrization and individually tested for positive semi-definiteness using Equation (8).

As a kind of reference, a manual approach (cf. Figure 2a) was chosen. A single SOGM is adjusted by manually tuning the parameters to achieve a good fit for all empirical covariances, ignoring *g*'Δ*<sup>L</sup>* <sup>0</sup> . This function follows the long-wavelength oscillation contained in the signal.

In a second approach, the process coefficients are estimated using the YW-Equations (14b) with the covariances *g*'Δ*<sup>L</sup>* <sup>0</sup> to *<sup>g</sup>*'Δ*<sup>L</sup> <sup>p</sup>* . Furthermore, the system in Equation (28) is solved uniquely using the lags from *g*'Δ*<sup>L</sup>* <sup>1</sup> up to *<sup>g</sup>*'Δ*<sup>L</sup> <sup>m</sup>* with *m* = *p*. In Figure 2b, the covariance function exactly interpolates the first five values. This covariance model with relatively low order already contains a second high-frequent oscillation of a 1-year period, caused by annual temperature variations, which is not obvious at first. However, the function misses the long wavelength oscillation and does not fit well to the higher covariances. Here, the model order can of course be increased in order to interpolate more covariances. However, it will be demonstrated now that the covariance structure at hand can in fact also be modeled with the low-order AR(4)-model.

For the remaining examples, the process coefficients are estimated using the MYW-Equations (14c) with *n* = 24 covariances. As a first case, the function was estimated with a manually chosen nugget effect, i.e., *g*'*S* <sup>0</sup> = *<sup>g</sup>*'Δ*<sup>L</sup>* <sup>0</sup> − 0.0019, and by solving Equation (28) with *m* = *p* − 1, which results in a pure AR(4)-process. The covariance function represented by Figure 2c approximates the correlations better compared to first two cases. The shape models all oscillations, but does not exactly pass through all values at the lags *τ* = 1.5 to 4 years.

Finally, the system of Equation (28) was solved using the higher lags *g*'*S <sup>m</sup>* up to *m* = 14 in order to enforce an ARMA-model. However, the best fit exceeds *g*'Δ*<sup>L</sup>* <sup>0</sup> , such that the best valid fit is achieved with a no nugget effect in the signal covariance model and a covariance function exactly interpolating *g*'Δ*<sup>L</sup>* <sup>0</sup> . Thus, we fix *<sup>g</sup>S*(0) to *<sup>g</sup>*'Δ*<sup>L</sup>* <sup>0</sup> using a constrained least squares adjustment, which is the result shown here. In comparison, Figure 2d shows the most flexible covariance function. The function passes very close to nearly all covariances up to *τ* = 6 years and the fit is still good beyond that. The approximated ARMA-process with order *p* = 4 allows more variation of the covariance function and the function fits better to higher lags.

**Figure 2.** Covariance functions for the Milan Cathedral data sets determined with four different approaches: (**a**) Intuitive ("naive") approach using a single SOGM-process with manually adjusted parameters. (**b**) Covariance function resulting from the interpolation of the first *p* + 1 covariances with a pure AR(4)-processes. (**c**) Covariance function resulting from an approximation procedure of *g*'*S* <sup>0</sup> to *g*'*S* <sup>24</sup> using a pure AR-process with *p* = 4. (**d**) Covariance function based on approximation with the most flexible ARMA-process (*p* = 4). Empirical covariances are shown by the black dots. The variances *g*'*S* <sup>0</sup> of the covariance functions are indicated by circles. The parameters of the processes (**c**) and (**d**) are provided in Tables 2 and 3.

The corresponding process parameters for the last two examples are listed in Tables 2 and 3. All parameters are given with numbers in rational approximation. The positive definiteness is directly

visible by the condition |*η*| ≤ *α*. The approximated pure AR(4)-process in Table 2 is relatively close to being invalid for the second component but still within the limits. For the ARMA-model, notably, poles, frequency, and attenuation parameters are the same, cf. Table 3. Just the phases *η* of the two components are different and also inside the bounds of positive definiteness. The resulting covariance function using the ARMA-model (Figure 2d) is given by

$$\begin{aligned} \text{g}^{\mathfrak{S}}(\tau) &= 0.01680 \cdot \text{e}^{-0.0723} \frac{\text{tr}}{\text{H}} \left( \cos \left( 0.1950 \frac{\tau}{\Delta t} \right) - 0.3167 \sin \left( 0.1950 \frac{\tau}{\Delta t} \right) \right) + \\ &0.00555 \cdot \text{e}^{-0.0633} \frac{\text{tr}}{\text{H}} \left( \cos \left( 1.5662 \frac{\tau}{\Delta t} \right) + 0.02614 \sin \left( 1.5662 \frac{\tau}{\Delta t} \right) \right) \end{aligned} \tag{30}$$

It needs to be noted here that the connection of unitless poles and parameters with distance *τ* cannot be done in the way indicated by Sections 2–6. In fact, for the correct covariance function, the argument *τ* requires a scaling with sampling interval Δ*t* which was omitted for reasons of brevity and comprehensibility. As a consequence, the scaling 1/Δ*t* to the argument *τ* is included in Equation (30). We also assess the same factor to the transition from *ω*<sup>0</sup> to ordinary frequency *ν*<sup>0</sup> given in Tables 2 and 3.

**Table 2.** Approximation with pure AR(2)-components. The frequency is also given as ordinary frequency *ν*0. The variance of the combined covariance function amounts to *σ*<sup>2</sup> = 0.02046 mm2 which leads to *<sup>σ</sup>*<sup>N</sup> = 0.04359 mm. *<sup>ω</sup>*<sup>0</sup> and *<sup>η</sup>* are given in units of radians.


**Table 3.** Approximation with SOGM-components, i.e., ARMA(2,*q*)-models. The variance of the combined covariance function amounts to *σ*<sup>2</sup> = 0.02235 mm2.


Using the process characteristics in Tables 2 and 3, it is obvious now that the long wavelength oscillation has a period of about 7.6 years. Diagnostics of the estimated process can be done in the same way in order to possibly discard certain components if they are irrelevant. In summary, the proposed approach can realize a much better combined modeling of long and short-wavelength signal components without manual choices of frequencies, amplitudes, and phases. The modified Yule–Walker equations prove valuable for a good fit of the covariance function due to the stabilization by higher lags. ARMA-models provide a further enhanced flexibility of the covariance function.

#### **8. Summary and Conclusions**

In this paper, we presented an estimation procedure for covariance functions based on methodology of stochastic processes and a simple and straightforward numerical method. The approach is based on the analogy of the covariance functions defined by the SOGM-process and autoregressive processes. Thus, we provide the most general damped oscillating autocorrelation function built from exponential and trigonometric terms, which includes several simple analytical covariance models as limit cases.

The covariance models of autoregressive process as well as of ARMA-processes correspond to a linear combination of covariance functions of second-order Gauss–Markov (SOGM) processes. We provide fitting procedures of these covariance functions to empirical covariance estimates based on simple systems of linear equations. Notably, the numerical method easily extends to ARMA-processes with the same linear system of equations. In the future, research will be done towards possibilities of constraining the bounds of stationarity and positive definiteness in the estimation steps.

The great advantage is that the method is automated and gives the complete model instead of needing to manually model each component. Our method is very flexible because the process estimation automatically chooses complex or real poles, depending on whether more oscillating or only decaying covariance components are necessary to model the process. Naturally, our approach restricts to stationary time series. In non-stationary cases, the empirically estimated covariance sequence would not decrease with increasing lag and by this contradict the specifications of covariance functions, e.g., *g*'Δ*<sup>L</sup>* <sup>0</sup> being the largest covariance, see Section 2 and [28]. Such an ill-shaped covariance sequence will definitely result in non-stationary autoregressive poles and the method will fail.

The real world example has shown that covariance function estimation can in fact give good fitting results even for complex covariance structures. The guidelines presented here provide multiple possibilities for fitting procedures and process diagnostics. As a result, covariance function estimation is greatly automatized with a generic method and a more consistent approach to a complete signal modeling is provided.

**Author Contributions:** Formal analysis, investigation, validation, visualization, and writing—original draft: T.S., methodology, writing—review and editing: T.S., J.K., J.M.B., and W.-D.S., funding acquisition, project administration, resources, supervision: W.-D.S. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was supported by the Deutsche Forschungsgemeinschaft (DFG) Grant No. SCHU 2305/7-1 'Nonstationary stochastic processes in least squares collocation— NonStopLSC'.

**Acknowledgments:** We thank the anonymous reviewers for their valuable comments.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Abbreviations**

The following abbreviations are used in this manuscript:


YW Yule–Walker

#### **References**


c 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## *Article* **Evaluation of VLBI Observations with Sensitivity and Robustness Analyses**

**Pakize Küreç Nehbit 1,2,\*, Robert Heinkelmann 2, Harald Schuh 2,3, Susanne Glaser 2, Susanne Lunz 2, Nicat Mammadaliyev 2,3, Kyriakos Balidakis 2, Haluk Konak <sup>1</sup> and Emine Tanır Kayıkçı <sup>4</sup>**


Received: 27 April 2020; Accepted: 3 June 2020; Published: 8 June 2020

**Abstract:** Very Long Baseline Interferometry (VLBI) plays an indispensable role in the realization of global terrestrial and celestial reference frames and in the determination of the full set of the Earth Orientation Parameters (EOP). The main goal of this research is to assess the quality of the VLBI observations based on the sensitivity and robustness criteria. Sensitivity is defined as the minimum displacement value that can be detected in coordinate unknowns. Robustness describes the deformation strength induced by the maximum undetectable errors with the internal reliability analysis. The location of a VLBI station and the total weights of the observations at the station are most important for the sensitivity analysis. Furthermore, the total observation number of a radio source and the quality of the observations are important for the sensitivity levels of the radio sources. According to the robustness analysis of station coordinates, the worst robustness values are caused by atmospheric delay effects with high temporal and spatial variability. During CONT14, it is determined that FORTLEZA, WESTFORD, and TSUKUB32 have robustness values changing between 0.8 and 1.3 mm, which are significantly worse in comparison to the other stations. The radio sources 0506-612, NRAO150, and 3C345 have worse sensitivity levels compared to other radio sources. It can be concluded that the sensitivity and robustness analysis are reliable measures to obtain high accuracy VLBI solutions.

**Keywords:** very long baseline interferometry; sensitivity; internal reliability; robustness; CONT14

#### **1. Introduction**

Very Long Baseline Interferometry (VLBI) is used to measure the arrival time differences of the signals that come from extragalactic radio sources to antennas separated by up to one Earth diameter. The main principle of the VLBI technique is to observe the same extragalactic radio source synchronously with at least two radio telescopes. Global distances can be measured with millimeter accuracy using the VLBI technique [1,2].

In 1967, for the first time, VLBI was used for the detection of light deflection [3,4]. Nowadays, VLBI is a primary technique to determine global terrestrial reference frames and in particular their scale, celestial reference frame, and the Earth Orientation Parameters (EOP), which consist of universal

time, and terrestrial and celestial pole coordinates [2]. Since VLBI is the only technique that connects the celestial with the terrestrial reference frames, the technique is fundamentally different from the other space geodetic techniques. The radio sources are objects in the International Celestial Reference Frame (ICRF); however, the antenna coordinates are obtained in the International Terrestrial Reference Frame (ITRF).

As a result of its important role in either the Celestial Reference Frame or the Terrestrial Reference Frame, it is essential to investigate the quality of VLBI observations and its effect on the unknown parameters. For this reason, the quality of the VLBI observations was investigated according to sensitivity and robustness criteria. Although robustness and sensitivity criteria are not new methods in geodesy, they have been applied to the VLBI observations for the first time in this study. The location of the weak stations and radio sources were easily detected using sensitivity and robustness criteria. Using reliability criteria allows detecting observations that have undetectable gross errors on the unknown parameters. Besides, investigation of the sensitivity of the network against the outliers plays a crucial role in the improvement of accuracy. In this way, the scheduling can be improved using this method in the future.

Sensitivity criteria have been an inspiration for many scientific investigations. The criteria can be explained as the network capability for the monitoring of crustal movements and deformations [5]. So far, mostly geodetic networks based on GPS measurements have been analyzed: sensitivity levels were determined in [6], the datum definition was investigated using sensitivity in [7], a priori sensitivity levels were computed in [8], and the determination of the experimental sensitivity capacities was examined in [9].

Robustness criteria were developed as a geodetic network analysis alternative to standard statistical analysis in [10]. Robustness analysis is the combination of the reliability analysis introduced in [11] and the geometrical strength analysis. Robustness has been the main topic of many studies until today. Different strain models were defined with homogeneous and heterogeneous deformation models in [12]. The displacement vectors defined the effect of the undetectable gross error on the coordinate unknowns, which was determined independently from the translation in [13]. In addition, to obtain the corrected displacement vector, global initial conditions represented by the whole station network were used. Local initial conditions aiming at minimizing the total displacement were developed for the polyhedron represented by each network point as defined in [14].

The paper is organized as follows. Section 2 presents the theoretical background of the sensitivity analysis in the VLBI network. Section 3 introduces the theoretical background of robustness. Section 4 investigates the sensitivity and robustness levels of the VLBI network observed during the continuous campaign CONT14, a continuous VLBI session, which will be further described in Section 4. There, 15 VLBI sessions were evaluated, and the outliers were detected using the software VieVS@GFZ (G2018.7, GFZ, Potsdam, Germany) [15], a fork from the Vienna VLBI Software [16]. The least-squares adjustment module of the VieVS@GFZ software was modified to determine the sensitivity levels of the stations and the radio sources and to obtain the robustness level of the observing stations. The sensitivity levels of the stations and the radio sources were obtained using the developed module for the 15 24-h sessions. The computed sensitivity levels of the stations and radio sources were compared session by session. In addition, the deformation resistance induced by the maximum undetectable errors with the internal reliability analysis was computed for each session. The obtained displacement vectors were compared with threshold values. Conclusions and recommendations for further research will be provided in Section 5.

#### **2. The Sensitivity in the VLBI Network**

Sensitivity is the minimum value of undetectable gross errors in the adjusted coordinate differences. The sensitivity levels give information about the weakness of a network. The sensitivity level is computed using the cofactor matrix of the displacement vector estimated from two different sessions.

Using the adjusted coordinates *x*ˆ1, *x*ˆ<sup>2</sup> and their cofactor matrices *Q*<sup>1</sup> *xx*, *<sup>Q</sup>*<sup>2</sup> *xx* based on different sessions 1 and 2, the displacement vector (*d*) and the corresponding cofactor matrix (*Qdd*) for one point (reference point of a station or radio source) are obtained using the following equations:

$$d = \hat{\mathfrak{x}}^1 - \hat{\mathfrak{x}}^2 \tag{1}$$

$$\mathbf{Q}\_{dd} = \mathbf{Q}\_{xx}^1 + \mathbf{Q}\_{xx}^2. \tag{2}$$

Alternatively, when it is aimed to obtain the sensitivity level of each session as a priori sensitivity level, the cofactor matrix of the displacement vector is obtained as *Qdd* = *Qxx* [9,14] and the weight matrix of the displacement vector for each station *Pdi* is computed by the following equations

$$
\begin{bmatrix} d\_1 \\ d\_2 \\ \vdots \\ d\_n \end{bmatrix} = \begin{bmatrix} \mathcal{N}\_{11} & \mathcal{N}\_{12} & \dots & \mathcal{N}\_{1n} \\ \mathcal{N}\_{21} & \mathcal{N}\_{22} & \dots & \mathcal{N}\_{2n} \\ \vdots & \ddots & \ddots & \vdots \\ \mathcal{N}\_{n1} & \mathcal{N}\_{n2} & \dots & \mathcal{N}\_{nn} \end{bmatrix}^+ \begin{bmatrix} \left(\boldsymbol{A}^T \boldsymbol{P} \boldsymbol{d} \boldsymbol{l}\right)\_1 \\ \left(\boldsymbol{A}^T \boldsymbol{P} \boldsymbol{d} \boldsymbol{l}\right)\_2 \\ \vdots \\ \left(\boldsymbol{A}^T \boldsymbol{P} \boldsymbol{d} \boldsymbol{l}\right)\_n \end{bmatrix} \tag{3}
$$

$$d\_i = \begin{bmatrix} dx\_i \\ dy\_i \\ dz\_i \end{bmatrix} = \bar{\mathbf{N}}\_i \mathbf{A}^T P dl \tag{4}$$

$$\mathbf{Q}\_{d\_i d\_i} = \bar{\mathbf{N}}\_i \mathbf{A}^T \mathbf{P} \mathbf{Q}\_{ll} \mathbf{P} \mathbf{A} \bar{\mathbf{N}}\_i^T = \bar{\mathbf{N}}\_i \mathbf{N} \ddot{\mathbf{N}}\_i^T \tag{5}$$

$$P\_{d\_i} = \left(Q\_{d\_i d\_i}\right)^{-1} \tag{6}$$

where *i* = 1, ... , *n* is the number of stations, *A* is the design matrix, *P* is the weight matrix, *N* is the normal equation system, *ATPdl* is the right-hand side vector, *Qll* is the cofactor matrix of the observations, *d<sup>i</sup>* is the displacement vector at the *i th* station, *<sup>P</sup>di* is the weight matrix of the displacement vector at the *i th* station, and .. *N<sup>i</sup>* is the sub-matrix of the normal equation system for the *i th* station.

The obtained weight matrix *Pdi* is decomposed into its eigenvalue and eigenvector. The minimum detectable displacement value—namely, the best sensitivity level (*dmin*)—depends on the inverse of the maximum eigenvalue of the weight matrix (λ*max*) for each station

$$\|\|d\|\|\_{\text{min}} = \frac{\mathcal{W}\_0 \sigma}{\sqrt{\lambda\_{\text{max}}}} \tag{7}$$

where σ is derived from the theoretical variance of the unit weight [6] and the threshold value of the non-centrality parameter (*W*0) is determined through *W*<sup>0</sup> = *W*(α0, γ0, *h*, ∞) based on the power of the test γ<sup>0</sup> = 80%, the significance level α<sup>0</sup> = 5%, and the degree of freedom *h* = 1 [17,18].

With single-session VLBI analysis, station and radio source coordinates, clock parameters, pole coordinates, and Universal Time 1 (UT1) minus Universal Time Coordinated (UTC), celestial pole coordinates, and atmosphere parameters can be determined [2,19]. To evaluate the VLBI observations, the mathematical model of the least-squares adjustment is expanded by the matrix of constraints *H*. The functional model for the actual observations *l* and constraint parameters *lh* can be written as follows:

$$
\sigma = A\mathbf{x} - I \tag{8}
$$

$$
\sigma\_{\mathfrak{c}} = H\mathfrak{x} - I\_{\mathfrak{h}} \tag{9}
$$

where *v* is the residual vector, *v<sup>c</sup>* is the residual vector of constraints, and *x* denotes the vector of the unknown parameters [20]. Accordingly, the functional model of the adjustment can be summarized with the following symbolic equations:

$$
\begin{bmatrix} \mathbf{v} \\ \mathbf{v}\_c \end{bmatrix} = \begin{bmatrix} A \\ H \end{bmatrix} \mathbf{x} - \begin{bmatrix} I \\ I\_h \end{bmatrix} \tag{10}
$$

and the corresponding stochastic model of the adjustment is written as:

$$P = \begin{bmatrix} P\_{ll} \\ & P\_{\mathcal{E}} \end{bmatrix} \tag{11}$$

where *Pll* is the weight matrix of the actual observations, *P<sup>c</sup>* is the weight matrix of the constraint parameters, and the remaining elements of this block-diagonal matrix are equal to zero. According to the adjustment model, the cofactor matrix of the unknown parameters is determined as:

$$\mathbf{Q}\_{xx} = \left(\mathbf{A}^T \mathbf{P} \mathbf{A} + \mathbf{H}^T \mathbf{P}\_c \mathbf{H}\right)^{-1} \tag{12}$$

where *Qxx* covers all unknown parameters of the respective VLBI session.

Using the functional and the stochastic models, the unknown parameters are computed with a free network adjustment. The cofactor matrix of the displacement vector of each station is as follows

$$\mathbf{Q}\_{\text{xx}} = \begin{bmatrix} \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot \\ \cdot & \cdot & q\_{x1x1} & q\_{x1x2} & q\_{x1y1} & q\_{x1y2} & q\_{x1z1} & q\_{x1z2} & \cdot & \cdot \\ \cdot & \cdot & q\_{x2x1} & q\_{x2x2} & q\_{x2y1} & q\_{x2y2} & q\_{x2z1} & q\_{x2z2} & \cdot & \cdot \\ \cdot & \cdot & q\_{y1x1} & q\_{y1x2} & q\_{y1y1} & q\_{y1y2} & q\_{y1z1} & q\_{y1z2} & \cdot & \cdot \\ \cdot & \cdot & q\_{y2x1} & q\_{y2x2} & q\_{y2y1} & q\_{y2y2} & q\_{y2z1} & q\_{y2z2} & \cdot & \cdot \\ \cdot & \cdot & q\_{z1x1} & q\_{z1x2} & q\_{z1y1} & q\_{z1y2} & q\_{z1z1} & q\_{z1z2} & \cdot & \cdot \\ \cdot & \cdot & q\_{z2x1} & q\_{z2x2} & q\_{z2y1} & q\_{z2y2} & q\_{z2z1} & q\_{z2z2} & \cdot & \cdot \\ \cdot & \cdot & q\_{z2x1} & q\_{z2x2} & q\_{z2y1} & q\_{z2y2} & q\_{z2z1} & q\_{z2z2} & \cdot & \cdot \\ \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot \end{bmatrix}\_{\mu,\mu} \tag{13}$$

where *<sup>u</sup>* is the number of unknown parameters. For the first station, the matrix .. *Ni* is determined as

$$
\bar{\mathbf{N}}\_1 = \begin{bmatrix}
\dots & \dots & \dots & q\_{\mathbf{1}1x1} & q\_{\mathbf{x1}12} & q\_{\mathbf{x1}y1} & q\_{\mathbf{x1}y2} & q\_{\mathbf{x1}z1} & q\_{\mathbf{x1}z2} \\
\dots & \dots & \dots & q\_{\mathbf{y1}x1} & q\_{\mathbf{y1}x2} & q\_{\mathbf{y1}y1} & q\_{\mathbf{y1}y2} & q\_{\mathbf{y1}z1} & q\_{\mathbf{y1}z2} \\
\dots & \dots & \dots & q\_{\mathbf{y1}x1} & q\_{\mathbf{z1}x2} & q\_{\mathbf{z1}y1} & q\_{\mathbf{z1}y2} & q\_{\mathbf{z1}z1} & q\_{\mathbf{z1}z2} \\
\end{bmatrix}\_{3, \mu} \tag{14}
$$

and the cofactor matrix of the displacement vector of the first station is obtained as

$$\mathbf{u}\left(\mathbf{Q}\_{d\_1d\_1}\right)\_{3,3} = \ddot{\mathbf{N}}\_1 \mathbf{N} \ddot{\mathbf{N}}\_1^T = \begin{bmatrix} q\_{\ge 1x1} & q\_{\ge 1y1} & q\_{\ge 1z1} \\ q\_{y1x1} & q\_{y1y1} & q\_{y1z1} \\ q\_{z1x1} & q\_{z1y1} & q\_{z1z1} \end{bmatrix}\_{3,3}.\tag{15}$$

In analogy, for each radio source, the cofactor matrix of the displacement vector is obtained using the following equations

$$\mathbf{Q}\_{\text{xx}} = \begin{bmatrix} \cdot & \cdot & \cdot & \cdot & \cdot & \cdot \\ \cdot & q\_{a1a1} & q\_{a1a2} & q\_{a1b1} & q\_{a1b2} & \cdot \\ \cdot & q\_{a2a1} & q\_{b2a2} & q\_{a2b1} & q\_{a2b2} & \cdot \\ \cdot & q\_{\delta 1a1} & q\_{\delta 1a2} & q\_{\delta 1\delta 1} & q\_{\delta 1\delta 2} & \cdot \\ \cdot & q\_{\delta 2a1} & q\_{\delta 2a2} & q\_{\delta 2\delta 1} & q\_{\delta 2\delta 2} & \cdot \\ \cdot & \cdot & \cdot & \cdot & \cdot & \cdot \end{bmatrix}\_{\text{u},\mu}. \tag{16}$$

α and δ are the source equatorial coordinates defined as right ascension and declination, respectively, and *<sup>u</sup>* is defined as above. For the first radio source, the matrix .. *N<sup>i</sup>* is determined as

$$
\bar{\mathbf{N}}\_1 = \begin{bmatrix}
\cdot & \cdot & q\_{a1a1} & q\_{a1a2} & q\_{a1\delta 1} & q\_{a1\delta 2} & \cdot & \cdot \\
\cdot & \cdot & q\_{\delta 1a1} & q\_{\delta 1a2} & q\_{\delta 1\delta 1} & q\_{\delta 1\delta 2} & \cdot & \cdot \\
\end{bmatrix}\_{2,\mu} \tag{17}
$$

and the cofactor matrix of the displacement vector of the first radio source is

$$\mathbf{N}\left(\mathbf{Q}\_{d\_1d\_1}\right)\_{2,2} = \bar{\mathbf{N}}\_1 \mathbf{N} \ddot{\mathbf{N}}\_1^T = \begin{bmatrix} q\_{a1a1} & q\_{a1\delta 1} \\ q\_{\delta 1a1} & q\_{\delta 1\delta 1} \end{bmatrix}\_{2,2}.\tag{18}$$

Subsequently, the corresponding weight matrix belonging to each station or radio source is computed as shown in Equation (6), and the minimum value of the undetectable gross errors is found by Equation (7).

#### **3. The Robustness of VLBI Stations**

Robustness is defined as a function of the reliability criteria [10]. On the other hand, the robustness of a geodetic network is defined as the strength of deformation caused by undetectable gross errors with the internal reliability analysis. The robustness analysis consists of enhancing the internal reliability analysis with the strain technique [10,21].

Internal reliability can be interpreted as the controlling of an observation via the other observations in a network. It can be quantified as the magnitude of the undetectable gross errors by using hypothesis testing. For correlated observations, the internal reliability of the *j th* observation is obtained with the following equations:

$$
\Delta\_{0j} = m\_0 \sqrt{\frac{\mathcal{W}\_0}{\mathbf{e}\_j^T \mathbf{P} \mathbf{Q}\_{\partial \boldsymbol{\vartheta}} \mathbf{P} \mathbf{e}\_j}} \tag{19}
$$

$$\mathbf{e}\_{j}^{T} = \begin{bmatrix} \dots & 0 & 0 & 1 & 0 & \dots \end{bmatrix} \tag{20}$$

where *m*<sup>0</sup> is derived from the a posteriori value of the experimental variance, *Qv*ˆ*v*<sup>ˆ</sup> is the cofactor matrix of the residuals, *e<sup>T</sup> <sup>j</sup>* is a selection vector, which consists of 1 for the *j th* observation and 0 for the other observations; its dimension equals the total number of observations.

The robustness of each VLBI station is quantified as the effect of the maximal undetectable gross error on the coordinate unknowns (Δ*x*) [10,13,22] as

$$
\Delta \mathbf{x} = \mathbf{Q} \mathbf{A}^T \mathbf{P} \Delta\_{0j} \tag{21}
$$

$$
\Delta\_{0j}^{-T} = \begin{bmatrix} \dots & 0 & 0 & \delta\_{0j} & 0 & \dots \end{bmatrix} \tag{22}
$$

where Δ0*<sup>j</sup> <sup>T</sup>* is a vector, which consists of the internal reliability value of the *j th* observation and 0 for the other observations, with the dimensions of the total number of observations. The displacement vector can be written as

$$
\Delta \mathbf{x}\_i = \begin{bmatrix} \Delta \mathbf{x}\_i \\ \Delta y\_i \\ \Delta z\_i \end{bmatrix} = \begin{bmatrix} u\_i \\ v\_i \\ w\_i \end{bmatrix} \tag{23}
$$

where *ui*, *vi*, and *wi* are the displacement vector components in the x-, y-, and z-directions.

$$
\Delta \mathbf{x}\_i^T = \begin{bmatrix} \Delta \mathbf{x}\_1; \ \Delta \mathbf{x}\_2; \dots, \dots, \dots, \dots, \varepsilon; \ \Delta \mathbf{x}\_j \end{bmatrix} \tag{24}
$$

The effect of the undetected gross error on the unknown coordinate is calculated for any coordinate unknown. The effect can be obtained many times using each observation for any coordinate unknown. Each observation causes strain with different magnitude and direction. For this reason, the observation having maximum effect on the coordinate unknowns must be identified

$$
\Delta \mathbf{x}\_{0j} = \max \{ \left| \Delta \mathbf{x}\_j \right| \}. \tag{25}
$$

It is supposed that the observation having a maximum vector norm causes maximum strain. To compute the vector norm of each observation, the L1 norm is used as

$$\|\Delta \mathbf{x}\_{j}\| = |\Delta \mathbf{x}\_{1}| + |\Delta \mathbf{x}\_{2}| + \dots + |\Delta \mathbf{x}\_{u}|\tag{26}$$

where *u* is the number of unknowns.

For the strain computation, the surface formed by the station and its neighboring stations, which are connected through baselines, is used. The strain resulting from the effect of the undetectable gross errors on the coordinate unknowns can be obtained for the polyhedron represented by each network point, with affine or extended Helmert transformation models [14,22].

The displacement vector related to the strain parameters can be determined with the equations:

$$
\begin{bmatrix}
\Delta x\_i \\
\Delta y\_i \\
\Delta z\_i
\end{bmatrix} = E\_i \begin{bmatrix}
X\_i - X\_0 \\
Y\_i - Y\_0 \\
Z\_i - Z\_0
\end{bmatrix} \tag{27}
$$

$$E\_i = \begin{bmatrix} \frac{\partial u}{\partial x} & \frac{\partial u}{\partial y} & \frac{\partial u}{\partial z} \\ \frac{\partial v}{\partial x} & \frac{\partial v}{\partial y} & \frac{\partial v}{\partial z} \\ \frac{\partial w}{\partial x} & \frac{\partial w}{\partial y} & \frac{\partial w}{\partial z} \end{bmatrix} = \begin{bmatrix} \mathcal{e}\_{xx} & \mathcal{e}\_{xy} & \mathcal{e}\_{xz} \\ \mathcal{e}\_{yx} & \mathcal{e}\_{yy} & \mathcal{e}\_{yz} \\ \mathcal{e}\_{zx} & \mathcal{e}\_{zy} & \mathcal{e}\_{zz} \end{bmatrix} \tag{28}$$

where *Ei* is the strain tensor, *X*0, *Y*0, and *Z*<sup>0</sup> are the initial conditions, *Xi*, *Yi*, and *Zi* are the coordinate unknowns of the *i th* station located on the surface, *exx* is the rate of change in the x-direction with respect to the position component in the x-direction [12].

The strain parameters are independent of the location of surfaces in the coordinate system. For this reason, at each surface, the strain tensor is computed with a reference point P0 selected on the surface. Using the obtained strain tensor, the objective function is linearized according to the initial conditions via

$$\sum\_{i=1}^{n} \left(\Delta \mathbf{x}\right)^{T} \mathbf{E}\_{i}^{T} \mathbf{E}\_{i} (\Delta \mathbf{x}) \to \min \tag{29}$$

$$\sum\_{i=1}^{n} \mathbf{E}\_i^T \mathbf{E}\_i (\mathbf{X}\_i - \mathbf{X}\_0) = \mathbf{0} \tag{30}$$

$$-\sum\_{i=1}^{n} \mathbf{E}\_i^T \mathbf{E}\_i \mathbf{X}\_0 + \sum\_{i=1}^{n} \mathbf{E}\_i^T \mathbf{E}\_i \mathbf{X}\_i = 0 \tag{31}$$

where the initial conditions *X<sup>T</sup>* <sup>0</sup> = [*X*<sup>0</sup> *Y*<sup>0</sup> *Z*0] are computed as follows

$$\mathbf{X}\_0 = \left[\sum\_{i=1}^n E\_i^T E\_i\right]^{-1} \sum\_{i=1}^n E\_i^T E\_i \mathbf{X}\_i \tag{32}$$

When inserting these initial conditions into Equation (27), the corrected displacement vector is obtained [13]. In other words, the displacement vector is translated to the gravity center of the surface computed as:

$$d\_i = \sqrt{u\_i^2 + v\_i^2 + w\_i^2}.\tag{33}$$

If the corrected displacement vector is estimated from the surface represented by the whole network of stations, the corrected global displacement vector is obtained. However, if the corrected displacement vector is estimated from the surface represented by each station, local initial conditions (*XL*0) are obtained as

$$\mathbf{X}\_{L0} = \left(\mathbf{E}\_i^T \mathbf{E}\_i\right)^{-1} \mathbf{E}\_i^T \mathbf{E}\_i \sum\_{i=1}^m \mathbf{X}\_i \tag{34}$$

where *m* is the number of stations that have a baseline to the *i th* station. Using the local initial conditions, the corrected local displacement vector is computed via Equation (27). The computed magnitudes of the displacement vectors are compared with the threshold value estimated from confidence ellipsoids [23]:

$$W\_i = m\_0 \sqrt{3F\_{h,f,1-a\_0}trace(Q\_{\text{xx}})} \tag{35}$$

where f is the degree of freedom, and α<sup>0</sup> is the significance level.

In case of *di* > *Wi*, we conclude that the network station is not robust [13]. In other words, the network is not sensitive enough to possible outliers and their disturbing effects on the coordinate unknowns.

Due to the fact that the displacement vectors obtained for any station represent the effect of undetectable errors on the coordinate unknowns [14,22], the displacement vectors can be compared to the sensitivity levels *dmin* as well.

#### **4. Results**

In order to compare VLBI stations and radio sources approximately under the same conditions, such as scheduling and station geographical distribution, we selected the continuous VLBI Campaign 2014 (CONT14) (https://ivscc.gsfc.nasa.gov/program/cont14) for the numerical test. CONT14 consists of 15 continuous VLBI sessions observed from 2014-May-6, 00:00 UT to 2014-May-20, 23:59 UT. The observations of CONT14 were evaluated session by session with the software VieVS@GFZ written in MatLab©.

To obtain the sensitivity levels of the radio sources and stations, Equations (6), (7), and (13)–(18) mentioned in Section 2 and, to obtain the robustness values of the network stations, Equations (19)–(26) mentioned in Section 3 were added to the least-squares adjustment module in VieVS@GFZ.

In order to obtain the strain parameters on the surfaces, displacement vector components and observed baselines were computed with a small C++ program for each session. According to the strain parameters, magnitudes of the corrected local and global displacement vectors were determined for each station and compared with the threshold values.

#### *4.1. Results of the Sensitivity Analysis*

The sensitivity level of a station reflects the total observation weights of the station and the remoteness of the station in the network. A small sensitivity value indicates that a station is strongly controlled by the other stations and hence, its sensitivity level is better.

According to the sensitivity analysis of the CONT14 campaign, the subset of European stations, ONSALA60, WETTZELL, ZELENCHK, MATERA, YEBES40M, and partly NYALES20 have the best sensitivity levels based on all sessions, whereas BADARY provides the worst sensitivity level based on all sessions (Figure 1). Across the sessions, there are small but significant differences as well.

The sensitivity levels of the radio sources show that some radio sources in individual sessions have orders of magnitude larger sensitivity levels, e.g., NRAO150, 3C345, 3C454.3, and 0506-612 (Figure 2).

**Figure 1.** The sensitivity level distributions of the antennas in continuous VLBI Campaign 2014 (CONT14). On the horizontal axis, the antennas are displayed in their respective order of appearance, i.e., unsorted.

**Figure 2.** The sensitivity distributions of the radio sources in CONT14. Sources appear randomly on the horizontal axis.

#### *4.2. Result of the Robustness Analysis*

The robustness of the network is computed based on the internal reliability and reflects the maximum effect of the undetectable gross error on the coordinate unknowns. In well-designed geodetic networks, the internal reliability value of the observations can be expected below 8*mi*, which is defined as the average error of the observations [8,24–27].

In each session, all observations were tested regarding whether they have gross errors. After the outliers were detected and removed from the observation list, the internal reliability of the observations was investigated.

In Figure 3, some internal reliability values with very large magnitudes can be easily identified. To investigate the large internal reliability magnitudes, the radio sources (and baselines) involved

in the observations were identified (Table 1). Comparing the findings to the sensitivity of the radio sources, it could be seen that these radio sources also had the worst sensitivity magnitudes.

If an acceptable mathematical model is used for the adjustment, the statistical analyses can be obtained confidently. For this reason, internal reliability and sensitivity analysis should be performed for all observations.

After all observations of the radio sources mentioned in Table 1 were excluded, it was found that the remaining internal reliabilities fell into a significantly smaller range in Figure 4 compared to Figure 3. Using the outlier-free radio source list, the sensitivity level of the radio sources was obtained. It is seen that radio source 3C454.3 has the maximum sensitivity level (Figure 5). In order to investigate the robustness of the stations with best quality observations, the radio sources having the worst sensitivity levels were excluded. When the observations are reduced according to both internal reliability and the sensitivity levels, the internal reliability criteria can be obtained for the well-designed network.

**Figure 3.** Internal reliability values of the observations in session 14MAY08XA (CONT14) (observations exceeding the red line were identified as outliers and excluded).

**Figure 4.** Internal reliability values of the outlier-free observations in session 14MAY08XA.


**Table 1.** Observations with largest internal reliability values in session 14MAY08XA.

After this step, the robustness values of the stations were computed. For this purpose, the observation having maximum effect on the coordinate unknowns in each session was selected for the robustness analysis. According to Table 2, it is clearly seen in all sessions that the FORTLEZA station is affected. Table 2 also displays the radio sources that were involved in the observations affecting FORTLEZA. However, the radio sources are identified as rather compact sources because of their small CARMS (closure amplitude rms) values based on natural weighting [28], which are below 0.4.

As mentioned above, the maximum effect of undetected gross error on the station coordinates is called a displacement vector, and it was computed using Equation (21) for CONT14. According to the obtained displacement vector components for CONT14, the magnitudes of the displacement vector components in both x and y directions are about the same but with a different sign, whereas the magnitude in the z direction is about one order of magnitude smaller. In all sessions, FORTLEZA is the most affected one due to undetected gross errors. If we focus on the motion of FORTLEZA during CONT14, the x component of the displacement vector was found to be about between 2 and 4 mm (Figure 6).

**Figure 5.** The sensitivity distribution of the radio sources after outlier elimination in session 14MAY08XA.

In each session, the robustness of the stations was obtained with the displacement vector components. To obtain the strain parameters, the surface that was used consists of the station and its neighboring stations connected through baselines. The strain parameters were computed using Equation (27) for the surface that contains each antenna. Using the strain parameters computed for all surfaces represented by the stations, the local displacement vectors were translated according to the gravity center of the surfaces with Equations (27) and (34). The distributions of the local displacement vector magnitudes are illustrated in Figure 7 for one session only: 14MAY08XA.


**Table 2.** List of observations with a maximum effect of the undetectable gross error on the station coordinates distribution during CONT14.

**Figure 6.** The effect of the undetectable gross errors on station coordinates of FORTLEZA.

**Figure 7.** Distribution of local displacement vector magnitudes for Very Long Baseline Interferometry (VLBI) antennas in session 14MAY08XA.

According to Figure 7, FORTLEZA, WESTFORD, and TSUKUB32 have the largest displacement vector magnitudes ranging between 0.8 and 1.3 mm. It can be easily seen that these antennas are affected by the observation having the maximum effect of the undetectable gross errors on the station coordinates.

To address the robustness of the antennas, the computed local displacement vector values were compared to the threshold values obtained applying Equation (35) and the sensitivity levels of the stations as obtained with Equation (7). It was found that all the stations are robust against undetectable gross errors, since the magnitudes of the local displacement vectors are smaller than the threshold values (Figure 8).

**Figure 8.** Robustness analysis for VLBI stations in session 14MAY08XA.

#### **5. Discussion**

In an astronomical aspect, the structure of the radio sources can cause errors [29], and the astrometric quality of the radio sources is defined by the structure index. Sources with an X-band index of 1 or 2 and S-band index of 1 can be considered as sources of the best astrometric quality. Furthermore, it is recommended that sources with an X-band index of 3 or 4 should not be used [30]. Besides that, previous studies indicate that source structure is a major error source in geodetic VLBI [31]. Sources such as 3C345, 2128−123, and 2134+004 having observations with larger internal reliability values compared to the other sources have a structure index (http://bvid.astrophy.u-bordeaux.fr/) of 4.13, 4.56, and 3.73, respectively, in the X-band. In addition, radio source NRAO150 with a structure index of 2.06 in the S-band has also observations with larger internal reliability. If the radio sources are compared in the view of their sensitivity levels and structure indices, it can be easily understood that the radio source 3C454.3 having a larger sensitivity level has a structure index of 2.9 in the S-band and of 3.84 in the X-band. The radio source 3C454.3 is defined as a quasi-stellar object (QSO) with a core-jet structure that elongates toward the west and bends toward the north-west.

In the robustness analysis, an observation having a maximum effect on the coordinate unknowns more seriously affects those stations used for observing it and their neighboring stations connected with baselines than the other stations in the network. For this reason, network geometry and observation plans are substantial for the robustness analysis. According to the result of the robustness analysis, the observation on the FORTLEZA–WESTFORD baseline has a maximum effect on the coordinate unknowns. In other words, larger magnitudes of the displacement vectors at these stations are obtained. As a result, both FORTLEZA and WESTFORD stations have larger robustness values. In addition,

because of the network geometry and the observation plan, TSUKUB32 has larger robustness value than the other stations.

Although the selection of CONT14 is convenient for an initial analysis, the measurements may have systematic errors that cannot be detected in the error analysis because of the short duration of the campaign. Therefore, sensitivity levels of the antennas and the radio sources and robustness values of VLBI antennas may be determined too optimistically.

According to our results, the internal reliability values of the observations and the sensitivity levels of the sources can be used to investigate the source quality together with the structure index. The sources can be excluded based on their sensitivity levels and structure indices. For this reason, it can be considered that robustness and sensitivity criteria can play a substantial role in scheduling in the future.

The software VieVS@GFZ was modified to determine the sensitivity levels and to detect the observations that are having a maximum effect on the coordinate unknowns. It can be used easily for routine analysis of VLBI sessions. However, to obtain the strain parameters and the robustness analysis, VieVS@GFZ should be further modified in the future.

#### **6. Conclusions**

In this research, we performed a quality assessment of VLBI observations during CONT14. The radio sources and the VLBI stations that took part in the CONT14 sessions were analyzed according to their sensitivity levels. Furthermore, a robustness analysis was applied for the antennas.

The controllability of one station through the other stations can be investigated by the sensitivity analysis. The location of the station in the network and the total weights of its observations are the most important contributors for the sensitivity. On the other hand, the total observation number of a radio source, and the quality of the observations are also important for the sensitivity levels of the radio sources. It was also found that the investigation of the relationship between the structure of radio source and their sensitivity level is of interest.

According to the robustness analysis of the station coordinates, all of the stations are robust against undetectable gross errors. Some of the stations such as FORTLEZA, WESTFORD, and TSUKUB32 have significantly worse robustness in comparison to the other stations. It is possible that the worst robustness values can be due to the effects of the atmosphere that changes very much with time and with the location of the stations. Another explanation could be the remoteness of the station in the network.

**Author Contributions:** Analyzing and writing—original draft preparation, P.K.N.; review and editing, R.H., H.S., S.G., S.L., N.M., K.B., H.K., E.T.K. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Acknowledgments:** P.K.N. acknowledges the Scientific and Technological Research Council of Turkey (TÜB˙ ITAK) for the financial support of the post-doctoral research program (2219). The International VLBI Service for Geodesy and Astrometry (IVS), [2] and [32], are acknowledged for providing data used within this study. We would like to thank all reviewers for the detailed comments which helped to improve the manuscript.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## *Article* **Variance Reduction of Sequential Monte Carlo Approach for GNSS Phase Bias Estimation**

#### **Yumiao Tian 1,\*, Maorong Ge 2,3,\* and Frank Neitzel 2,\***


Received: 29 February 2020; Accepted: 18 March 2020; Published: 3 April 2020

**Abstract:** Global navigation satellite systems (GNSS) are an important tool for positioning, navigation, and timing (PNT) services. The fast and high-precision GNSS data processing relies on reliable integer ambiguity fixing, whose performance depends on phase bias estimation. However, the mathematic model of GNSS phase bias estimation encounters the rank-deficiency problem, making bias estimation a difficult task. Combining the Monte-Carlo-based methods and GNSS data processing procedure can overcome the problem and provide fast-converging bias estimates. The variance reduction of the estimation algorithm has the potential to improve the accuracy of the estimates and is meaningful for precise and efficient PNT services. In this paper, firstly, we present the difficulty in phase bias estimation and introduce the sequential quasi-Monte Carlo (SQMC) method, then develop the SQMC-based GNSS phase bias estimation algorithm, and investigate the effects of the low-discrepancy sequence on variance reduction. Experiments with practical data show that the low-discrepancy sequence in the algorithm can significantly reduce the standard deviation of the estimates and shorten the convergence time of the filtering.

**Keywords:** GNSS phase bias; sequential quasi-Monte Carlo; variance reduction

#### **1. Introduction**

Global navigation satellite systems (GNSS) are widely used in positioning, navigation, and timing (PNT) services. The accuracy of the precise positioning can reach the level of centimeters and satisfy a pervasive use in civil and military applications. GNSS is being developed at a fast pace, and the systems in full operation at present include the United States of America (USA)'s Global Positioning System (GPS) and Russia's Global Navigation Satellite System (GLONASS). The European Union plans to complete the construction of the Galileo system, while China is going to fully operate the BeiDou Navigation Satellite System (BDS) by 2020 [1,2]. The basic principle of GNSS data processing is to mathematically solve the interesting PNT parameters in the observation models with measurements of the distances between GNSS satellites and receivers. However, the biases in the signal measurements lead to errors in the models and degrade the accuracy of the solutions. Consequently, the bias estimation plays an important role in the quality of the final PNT services [3–5]. Reducing the variance of the bias estimates can more precisely recover the measurements and improve the service quality.

Fast and precise GNSS data processing uses the carrier-wave phase measurement by the receivers. The phase measurement only records the fractional part of the carrier phase plus the cumulated numbers. Therefore, the phase measurements from GNSS receivers are not directly the satellite–receiver distance,

and an additional unknown integer ambiguity needs to be solved so that the distance can be recovered. Methods for solving the integer ambiguities were investigated in the past few decades, and some effective approaches such as the ambiguity function method (AFM) and the Least-squares ambiguity Decorrelation Adjustment (LAMBDA) method were proposed, which are widely used in practice [6,7]. The LAMBDA method-based GNSS data processing is usually composed of four steps [6,8]. Firstly, the interesting parameters for PNT services are estimated together with the unknown ambiguities by the least-squares method or Kalman filtering. Secondly, the integer ambiguities are resolved according to the float estimates and variance–covariance (VC) matrix by decorrelation and searching methods. Thirdly, the searched integer ambiguities are validated to assure that they are the correct integers. Fourthly, the interesting unknown parameters of PNT services in the measurement models are derived with the validated integer ambiguities. The reliable ambiguity resolution is critical for fast and precise PNT services. The above steps work well when the errors of the measurements are small, but the performance degrades quickly when the errors grow. The errors of the phase measurements affect the solutions of the float ambiguities in the first step and destroy the integer nature of the ambiguities that are searched in the second step. As a result, the fixed integer vector cannot pass the validation in the third step and, thus, the fast and precise GNSS PNT services will be unreachable.

GNSS signals propagate through the device hardware when they are emitted from the satellite or received by the receivers, leading to time delays, i.e., biases. The biases play the role of errors in the measurements when they cannot be successfully estimated and, thus, block the success of ambiguity fixing. The difficulty in the phase bias estimation lies in the correlation between the unknown bias and the ambiguity parameters. This correlation leads to rank-deficiency of the equation set and, thus, the parameters cannot be solved by the least-squares method or Kalman filtering method in the first step of ambiguity fixing [9,10]. It should be noted that estimation of some parameters leading to rank-deficiency in GNSS data processing can be avoid by the techniques such as *S*-system theory [11]. However, those techniques focus on solving the estimable parameters and cannot solve the problems when the inestimable parameters are critical. In this case, if we want to estimate the bias parameter or the ambiguity parameter accurately, the conventionally inestimable parameter must be accurately known, which is a dilemma for GNSS researchers. Fortunately, the Monte Carlo-based approaches have the potential to solve this dilemma [12,13]. Furthermore, it can be found in references that the Monte Carlo method is also used for the ambiguity resolution without phase error estimation in attitude determination [14] and code multipath mitigation with only code observations [15]. Those researches use different ideas in data processing and are not related to the topic of phase bias estimation.

The sequential Monte Carlo (SMC) method or particle filtering is used in the state-space approach for time series modeling since the basic procedure proposed by Goden [16] (see the descriptions in References [16–18]). The SMC is mainly to solve problems with non-Gaussian and non-linear models, while it is rarely used in GNSS data processing. SMC can be regarded as Bayesian filtering implemented by the Monte Carlo method [19]. A state-space model, i.e., hidden Markov model, can be described by two stochastic processes {*xt*} *T <sup>t</sup>*=<sup>1</sup> and - *y T <sup>t</sup>*=1. The latent Markov process of initial density satisfies *<sup>x</sup>*<sup>0</sup> <sup>∼</sup> <sup>μ</sup>(*x*0), and the Markov transition density is *<sup>f</sup>*(*xk*|*xk*−1), with - *y T <sup>t</sup>*=<sup>1</sup> satisfy *g*(*yk*|*xk*), which is a conditional marginal density. Bayesian filtering gives the estimation of the posterior density *P*(*xk*|*y*1:*k*) = *g*(*yk*|*xk*)*g*(*xk*|*xk*−1)/*P*(*yk*|*y*1:*k*−1), where *P*(*yk*|*y*1:*k*−1) is a normalizing constant. The analytical solution *P*(*xk*|*y*1:*k*) can be derived for some special cases such as the solution of a Kalman filter for linear models with Gaussian noise. Otherwise, the analytical solution is not available, and the Monte Carlo-based solutions can be used to approximate the solution via random samples as SMC. The probability density of the variable is represented by weighted samples, and the estimates can be expressed by *xk* = 1/*N <sup>N</sup> <sup>i</sup>*=<sup>1</sup> *<sup>x</sup><sup>i</sup> k* . The SMC is mainly composed of three steps according to References [20,21], the update step which updates the weighs of the particles according to *g*(*yk*|*xk*), the resampling step to avoid degeneracy indicating most particles with weights close to zero, and the prediction step which transits the particles to the next epoch.

However, the random sequence used in the Monte Carlo method has possible gaps and clusters. Quasi-Monte Carlo (QMC) replaces the random sequence with a low-discrepancy sequence which can reduce the variance and has better performance [22–25]. Until now, the QMC-based variance reduction method in GNSS data processing was not addressed. This study aims to combine the GNSS data processing procedure and the sequential QMC (SQMC) methods to obtain precise GNSS phase bias estimates. The paper firstly gives an overview of the mathematical problem in GNSS bias estimation and then provides a renewed algorithm introducing the variance reduction method based on QMC to precisely estimate GNSS phase bias.

The remainder of this article is structured as follows: Section 2 presents the procedure and mathematical models of the GNSS data processing and introduce the difficulties in phase bias estimation. Section 3 gives an overview of the QMC theory. Section 4 describes the proposed GNSS phase bias estimation algorithm based on the SQMC method. Section 5 gives the results of phase bias estimation with practical data, and Section 6 draws the key research conclusions.

#### **2. Mathematic Models of GNSS Precise Data Processing and the Dilemma**

The code and phase measurements are usually used to derive the interesting parameters for GNSS services. The code measurement is the range obtained by multiplying the traveling time of the signal when it propagates from the satellite to the receiver at the speed of light. The phase measurement is much more precise than the code measurement but is ambiguous by an unknown integer number of wavelengths when used as range, and the ambiguities are different every time the receiver relocks the satellite signals [7].

In the measurement models, the unknowns include not only the coordinate parameters, but also the time delays caused by the atmosphere and device hardware, as well as the ambiguities for phase measurement. In relative positioning, the hardware delays can be nonzero values and should be considered in multi-GNSS and GLONASS data processing, i.e., inter-system bias (ISB) [9,10,26] and inter-frequency bias (IFB) [27], respectively. The ISB and IFB of the measurements are correlated with the ambiguities and are the key problems to be solved.

The double difference (DD) measurement models are usually constructed to mitigate the common errors of two non-difference measurements. For short baselines, the DD measurement mathematical models including the interesting parameters for GNSS PNT services, such as coordinates for positioning, the unknown ambiguities, and the ISB or IFB parameters, can be written in the form of matrices as

$$w = A\mathbf{x} + D\mathbf{b} + \mathbf{C}\mathbf{y} + l,\tag{1}$$

where *v* denotes the vector of observation residuals; *b* is composed of unknown single difference (SD) ambiguities *Ni*<sup>1</sup> *<sup>a</sup>b*, *<sup>N</sup>i*<sup>2</sup> *<sup>a</sup>b*, ... , *<sup>N</sup>in ab* **,** where *i* is the reference satellite and *n* is the number of the DD-equations, and *a* and *b* are the stations; *y* includes the ISB and IFB rate; vector *x* contains the unknown station coordinate and the other interesting parameters; *l* is the measurements from the receiver; *A* is the design matrix of the elements in *x*; *D* is the design matrix with elements of zero and the corresponding carrier wavelength. Matrix *D* transforms SD ambiguities to DD ambiguities; *C* is the design matrix of *y* with elements of zero and the SD of the channel numbers for phase IFB rate parameter, with elements of zero and one for the phase ISB parameter.

GNSS data processing such as for precise positioning is used to precisely determine the elements in *x*. Denoting the weight matrix of the DD measurements [7] by *P*, the normal equation of the least-squares method is

$$
\begin{bmatrix}
\begin{array}{c}
A^T P \mathbf{A} & A^T \mathbf{P} \mathbf{D} & A^T \mathbf{P} \mathbf{C} \\
& \mathbf{D}^T \mathbf{P} \mathbf{D} & \mathbf{D}^T \mathbf{P} \mathbf{C}
\end{array}
\end{bmatrix}
\begin{bmatrix}
x \\
b \\
y \\
\end{bmatrix} = \begin{bmatrix}
A^T P l \\
D^T P l
\\ \mathbf{C}^T \mathbf{P} l
\end{bmatrix}.
\tag{2}
$$

For simplification, the notation in Equation (3) is used.

$$
\begin{bmatrix}
\ N\_{\text{xx}} & N\_{\text{xb}} & N\_{\text{xy}} \\
& \ N\_{\text{bb}} & N\_{\text{by}} \\
\text{sym} & & N\_{\text{yy}}
\end{bmatrix}
\begin{bmatrix}
\ x \\
b \\
y
\end{bmatrix} = 
\begin{bmatrix}
\mathcal{W}\_{\text{x}} \\
\mathcal{W}\_{\text{b}} \\
\mathcal{W}\_{\text{y}}
\end{bmatrix}.
\tag{3}
$$

If the bias vector *y* is precisely known, the estimation of *x* can be realized by following four steps.

**Step 1**: Derive the solution of *x* and *b* with float SD-ambiguities by least-squares method.

$$
\begin{bmatrix}
\hat{\mathbf{x}} \\
\hat{b}
\end{bmatrix} = \begin{bmatrix}
N\_{\mathbf{x}\mathbf{x}} & N\_{\mathbf{x}b} \\
N\_{\mathbf{b}\mathbf{x}} & N\_{\mathbf{b}b}
\end{bmatrix}^{-1} \begin{bmatrix}
\mathcal{W}\_{\mathbf{x}} - \mathcal{N}\_{\mathbf{x}\mathbf{y}}\mathcal{y} \\
\mathcal{W}\_{\mathbf{b}} - \mathcal{N}\_{\mathbf{b}\mathbf{y}}\mathcal{y}
\end{bmatrix} = \begin{bmatrix}
\mathcal{Q}\_{\mathbf{x}\mathbf{x}} & \mathcal{Q}\_{\mathbf{x}b} \\
\mathcal{Q}\_{\mathbf{b}\mathbf{x}} & \mathcal{Q}\_{\mathbf{b}b}
\end{bmatrix} \begin{bmatrix}
\mathcal{W}\_{\mathbf{x}} - \mathcal{N}\_{\mathbf{x}\mathbf{y}}\mathcal{y} \\
\mathcal{W}\_{\mathbf{b}} - \mathcal{N}\_{\mathbf{b}\mathbf{y}}\mathcal{y}
\end{bmatrix}.
\tag{4}
$$

After the float SD ambiguities in *b* are estimated, the SD ambiguities and their VC matrix are transformed into DD ambiguities *bD*<sup>ˆ</sup> *<sup>D</sup>*<sup>ˆ</sup> and the corresponding VC matrix *Qb*<sup>ˆ</sup>*b*<sup>ˆ</sup> by differencing.

**Step 2**: Fix the integer ambiguities. The elements of *b*ˆ*DD* are intrinsically integer values but the values calculated are floats. Resolving the float values to integers can improve the accuracy to sub-centimeter level with fewer observations [28]. The ambiguity resolution can be expressed by

$$
\mathfrak{b} = F(\mathfrak{b}),
\tag{5}
$$

where function *F*() maps the ambiguities from float values to integers. This process can be implemented by the LAMBDA method [6,8] which can efficiently mechanize the integer least square (ILS) procedure [29]. This method is to solve the ILS problem described by

$$\min \left( \boldsymbol{\mathfrak{b}} - \overline{\boldsymbol{\mathfrak{b}}} \right)^{T} \boldsymbol{Q}\_{\boldsymbol{\mathfrak{b}} \boldsymbol{\mathfrak{b}}} \, ^{-1} \left( \boldsymbol{\mathfrak{b}} - \overline{\boldsymbol{\mathfrak{b}}} \right), \text{ with } \overline{\boldsymbol{\mathfrak{b}}} \in \boldsymbol{Z}^{n}, \tag{6}$$

where *b* denotes the vector of integer ambiguity candidates. The LAMBDA procedure contains mainly two steps, the reduction step and the search step. The reduction step decorrelates the elements in *b*ˆ and orders the diagonal entries by *Z*-transformations to shrink the search space. The search step is a searching process finding the optimal ambiguity candidates in a hyper-ellipsoidal space.

**Step 3:** Validate the integer ambiguities. The obtained ambiguity combination *b*ˇ is not guaranteed to be the correct integer ambiguity vector and it requires to be validated. The R-ratio test [29,30] can be employed. This test tries to ensure that the best ambiguity combination, which is the optimal solution of Equation (6), is statistically better than the second best one. The ratio value is calculated by

*RATIO* <sup>=</sup> *b*<sup>ˆ</sup> <sup>−</sup> *<sup>b</sup>*<sup>ˇ</sup> <sup>2</sup>*Qb*<sup>ˆ</sup>*b*ˆ/ *b*<sup>ˆ</sup> <sup>−</sup> *<sup>b</sup>*<sup>ˇ</sup> <sup>2</sup> *Qb*<sup>ˆ</sup>*b*<sup>ˆ</sup> , (7)

where *b*ˇ is the second best ambiguity vector according to Equation (6). The integer ambiguity vector *b*ˇ will be accepted if the ratio value is equal to or larger than a threshold, and it will be refused if the ratio value is smaller than the threshold.

**Step 4**: Derive the fixed baseline solution. After the integer ambiguity vector passes the validation test, *b*ˇ is used to adjust the float solution of other parameters, leading to the corresponding fixed solution. This process can be expressed by

$$\dot{\mathfrak{X}} = \dot{\mathfrak{x}} - \mathbb{Q}\_{\hat{x}\hat{b}} \mathbb{Q}\_{\hat{b}\hat{b}}{}^{-1} \Big(\hat{b} - \check{b}\big), \tag{8}$$

$$\mathbf{Q}\_{\mathbf{\hat{x}\hat{x}}} = \mathbf{Q}\_{\mathbf{\hat{x}\hat{x}}} - \mathbf{Q}\_{\mathbf{\hat{x}\hat{b}}} \mathbf{Q}\_{\mathbf{\hat{b}\hat{b}}}{}^{-1} \mathbf{Q}\_{\mathbf{\hat{b}\hat{x}}\prime} \tag{9}$$

where *x*ˇ denotes the fixed solution of *x*; *Qx*ˇ*x*<sup>ˇ</sup> is the VC matrix of the fixed solution *x*ˇ; *Q*<sup>ˆ</sup> *bx*<sup>ˆ</sup> is the VC matrix of *b*ˆ and ˆ*x*; *b*ˆ refers to the float ambiguity solution; ˆ*x* is the float solution of *x*.

The fixed solution *x*ˇ can reach sub-centimeter level. If errors in the observation models are removed, the successful ambiguity fixing requires observations of only a few epochs, even a single epoch.

From Equations (5) and (6), the successful ambiguity resolution requires accurate float ambiguity estimates and the corresponding VC matrix. If the bias in *y* is unknown, the bias cannot be separated by Equation (4) but stays with the ambiguity parameter. As a result, the obtained ambiguity parameters include bias and the ambiguity resolution will fail. When both the bias and the ambiguity are parameterized simultaneously, the bias parameter is correlated with the ambiguity parameter and, thus, it is impossible to separate the bias values and get precise float ambiguities estimates. Mathematically, the normal equation set (Equation (2)) will be rank-deficient and cannot be solved.

#### **3. Monte Carlo-Based Algorithms**

#### *3.1. Bayes Filtering*

In a state-space model, the transition function and the measurement model can be expressed by

$$\mathbf{x}\_{\mathbf{k}} = \mathbf{f}\_{\mathbf{k}}(\mathbf{x}\_{\mathbf{k}-1}, \mathbf{e}\_{\mathbf{k}}),\tag{10}$$

$$y\_k = h\_k(\mathbf{x}\_k, \mathbf{e}\_k),\tag{11}$$

where *y<sup>k</sup>* is the measurement vector at epoch *k*; *fk*() is the transition function; *hk*() is the measurement function; *<sup>k</sup>* and *e<sup>k</sup>* are the state noise and the measurement noise, respectively. This model indicates a first-order Markov process because the estimated state vector is only related to the states of the previous epoch *k* − 1, but not to other states before.

Considering the Chapman–Kolmogorov equation [31],

$$p(\mathbf{x}\_k | y\_{1:k-1}) = \int p(\mathbf{x}\_k | \mathbf{x}\_{k-1}) \, p(\mathbf{x}\_{k-1} | y\_{1:k-1}) d\mathbf{x}\_{k-1},\tag{12}$$

the posterior density *p xk*|*y*1:*<sup>k</sup>* can be estimated according to the Bayes's theorem by

$$p(\mathbf{x}\_k | y\_{1:k}) = p(y\_k | \mathbf{x}\_k) p(\mathbf{x}\_k | y\_{1:k-1}) / p(y\_k | y\_{1:k-1}).\tag{13}$$

The expectation of *x* can, hence, be expressed by

$$
\hat{\mathbf{x}} = \int \mathbf{x} p(\mathbf{x}\_k | y\_{1:k}) d\mathbf{x}.\tag{14}
$$

Combining Equations (12) and (13), the estimates of *x* on each epoch can be calculated theoretically. The optimal analytical expression for *p xk*|*y*1:*<sup>k</sup>* can be derived for a linear Gauss–Markov model as a Kalman filter but cannot be obtained for most cases. Fortunately, the suboptimal solutions by the Monte Carlo method are usually available.

#### *3.2. Importance Sampling*

In Monte Carlo methods, the probability density function (PDF) *p xk*|*y*1:*<sup>k</sup>* is represented by *N* samples *xi N i*=1 ; therefore,

$$p\left(\mathbf{x}\_{k}|y\_{1:k}\right) \approx \frac{1}{N} \sum\_{i=1}^{N} \delta\left(\mathbf{x} - \mathbf{x}^{i}\right) \Rightarrow \hat{\mathbf{x}} \approx \mathbf{E}\left(\left\{\mathbf{x}^{i}\right\}\_{i=1}^{N}\right). \tag{15}$$

where δ() is the Dirac delta function.

The posterior density is not precisely known at the beginning of epoch k. Assuming a prior PDF *q*(*x*) is known from which the samples can be generated and *p*(*x*) is the posterior density to be estimated, after the samples are generated, they can be used to load the information provided by the measurements at epoch k and obtain more precisely posterior density. The expectation of the unknown state vector can be calculated by

$$\mathbf{x}^\* = \int \mathbf{x} \mathbf{p}(\mathbf{x}) d\mathbf{x} = \int \mathbf{x} \mathbf{p}(\mathbf{x}) / \mathbf{q}(\mathbf{x}) \mathbf{q}(\mathbf{x}) d\mathbf{x} = \int \mathbf{x} (\mathbf{p}(\mathbf{x}) / \mathbf{q}(\mathbf{x})) \mathbf{q}(\mathbf{x}) d\mathbf{x} = \int \mathbf{x} \le \mathbf{w}(\mathbf{x}) \neq \mathbf{0} x,\tag{16}$$

where *w*(*x*) = *p*(*x*)/*q*(*x*); *q*(*x*) is the importance density function.

As the PDFs are represented via the Monte Carlo method and according to the Bayes's theorem, the expectation can be written as

$$\hat{\mathfrak{x}} \approx \frac{1}{N} \sum\_{i=1}^{N} w^{i} \mathfrak{x}^{i} \tag{17}$$

where *<sup>w</sup><sup>i</sup>* of *<sup>x</sup><sup>k</sup>* is derived by *<sup>w</sup>* (*xk*) <sup>=</sup> *<sup>N</sup> <sup>i</sup>*=<sup>1</sup> *<sup>w</sup><sup>i</sup> k*−1 *p yk*|*x<sup>k</sup> p xk*|*x<sup>i</sup> k*−1 /*q xk*|*x<sup>i</sup> k*−1 , *y<sup>k</sup>* .

#### *3.3. Sequential Monte Carlo Algorithm*

A sequential importance sampling (SIS) procedure can be implemented to get estimates of *x*. Firstly, sample set *xi* 0 *N <sup>i</sup>*=<sup>1</sup> is generated with equal weights according to the initial distribution *<sup>q</sup>*(*x*0). Then, the samples are reweighted according to the likelihood of the measurements and the estimates *x*ˆ are calculated. Afterward, the samples are transited to the next epoch. In practice, the SIS quickly degenerates during the filtering as more and more particles get negligible weights. Fortunately, the degeneracy problem can be solved by resampling which duplicates the samples with large weights and deletes the samples with small weights.

The resampling step is implemented as follows: (a) the numerical CDF *Wi k N <sup>i</sup>*=<sup>1</sup> of x is constructed with *W<sup>i</sup> <sup>k</sup>* <sup>=</sup> *<sup>i</sup> <sup>j</sup>*=<sup>1</sup> *<sup>w</sup><sup>j</sup> k* ; (b)CDF {*ui*} *N <sup>i</sup>*=<sup>1</sup> is generated by *ui* <sup>=</sup> ((*<sup>i</sup>* <sup>−</sup> <sup>1</sup>) <sup>+</sup> *<sup>u</sup>*)/*N*, where *<sup>u</sup>* is a random value over interval [0, 1); (c) <sup>m</sup> <sup>=</sup> 1; for each <sup>i</sup> <sup>=</sup> 1, ... , N, if *<sup>w</sup> <sup>m</sup> <sup>k</sup>* <sup>&</sup>lt;*ui*, *<sup>x</sup><sup>m</sup> <sup>k</sup>* is deleted by setting *<sup>m</sup>* = *<sup>m</sup>* + 1; otherwise, *x<sup>m</sup> <sup>k</sup>* is duplicated by setting *<sup>x</sup><sup>i</sup> <sup>k</sup>* = *<sup>x</sup><sup>m</sup> <sup>k</sup>* ; (d) the new sample set *xi k N <sup>i</sup>*=<sup>1</sup> is assigned equal weights.

It is not necessary to resample the samples each epoch, and a condition can be set by comparing the effective number with a threshold. This resampling procedure adequately solves the degeneracy problem in SIS in practice. Considering the resampling step, the SMC procedure can be implemented as Algorithm 1.


#### *3.4. Sequential Quasi-Monte Carlo Algorithm*

The pseudo random numbers usually used in the Monte Carlo method encounter possible gaps and clusters in the sampling fields. This can be avoided by a QMC method which replaces the random sequences with low-discrepancy sequences such as the Sobol sequence, Halten sequence, and so on [32,33].

The QMC sequence is a deterministic sequence and is uniformly distributed over [0, 1] *d* . Let *ui N i*=1 be a sequence of vectors in [0, 1] *<sup>d</sup>* and = [0, *x*] be a subinterval of [0, 1] *d* . is the sub sequence of *ui N i*=1 first belonging to . The discrepancy of the sequence is defined by

$$D^\*(\left\{\mathbf{u}^i\right\}\_{i=1}^N) = \sup\_{\mathbf{x} \in \left[0,1\right]^d} \left| \frac{\#\,of}{N} - \prod\_{j=1}^d \mathbf{x}^j \right|,\tag{18}$$

where sup refers to the supremum. A sequence with small discrepancy defined by the above formula is named a low-discrepancy sequence. Koksma–Hlawka theorem indicates that the approximation error of a real function represented by discrete numbers, i.e., the left side of Equation (19), is bounded by the product of two independent factors, the variation of the real function, and the discrepancy of the discrete numbers. Therefore, we have the following inequality:

$$\frac{1}{N} \sum\_{i=1}^{N} f\left(\mathbf{u}^{i}\right) - \int\_{\left[0,1\right]^{d}} f(\mathbf{u}) d\mathbf{u}| \le V(f) D^{\*} \left(\left\{\mathbf{u}^{i}\right\}\_{i=1}^{N}\right). \tag{19}$$

where *V*(*f*) is the variation of *f* in the sense of Hardy and Krause [34]. A low-discrepancy sequence has *D*∗ *ui N i*=1 = *O N*−<sup>1</sup> ln*<sup>d</sup> N* . This indicates that the QMC estimate for numerical integration has a probabilistic error bound of *O N*−<sup>1</sup> ln*<sup>d</sup> N* , which is better than the error bound of MC estimate *O N*−1/2 . This can improve the convergence and enables efficient computing.

It is difficult to analyze the accuracy of the approximation by QMC in practice as the points are regular. Therefore, randomized QMC (RQMC) can be used so that every element of the sequence is uniformly distributed over the unit cub but still has a low-discrepancy property [35–37]. Figure 1 shows the first 200 samples of the sequences for RQMC sampling, pseudo-random sampling, and the corresponding Gaussian sampling with a Sobol sequence. The SQMC algorithm is presented in Algorithm 2.

**Figure 1.** The 200 points investigated for the randomized quasi-Monte Carlo (RQMC) sampling, pseudo-random sampling, and the corresponding Gaussian sampling with a Sobol sequence.


#### **4. SQMC-Based Algorithm for Phase Bias Estimation**

The ratio for integer ambiguity validation in step 3 of Section 2 reflects the closeness of the float ambiguity vector to the integer ambiguity vector and, thus, shows the quality of the ambiguity fixing performance. If the ambiguity is successfully fixed to the integer ambiguities with high probability, the phase bias can be precisely derived. Although the searching and validation step cannot be linearized to satisfy the conditions for using the linear least-squares methods, we can count on the Monte Carlo-based method to develop algorithms for precise phase bias estimation.

The ratio value used in the ambiguity validation reflects the quality of the integer ambiguity fixing performance as used in the ambiguity validation. If *b<sup>k</sup>* is the correct ambiguity vector and *x<sup>k</sup>* represents the phase bias parameters at epoch *k*, we can have the assumption that the conditional probability density *p*(*bk*|*xk*) has the proportional relationship *p*(*bk*|*xk*) ∝ *ratio*(*xk*), and simply let

$$p\left(\mathbf{b}\_{k}|\mathbf{x}\_{k}^{i}\right) = ratio\left(\mathbf{x}\_{k}^{i}\right) / \sum\_{i=1}^{N} ratio\left(\mathbf{x}\_{k}^{i}\right). \tag{20}$$

The PDF *p*(*bk*|*xk*) is then used as the likelihood function in the Monte Carlo-based estimation method to update the weights. This is expressed as

$$
\overline{w}\_k^i = w\_{k-1}^i p\left(\mathbf{b}\_k | \mathbf{x}\_k^i\right) = w\_{k-1}^i \
ratio \left(\mathbf{x}\_k^i\right) / \sum\_{i=1}^N \
ratio \left(\mathbf{x}\_k^i\right). \tag{21}
$$

This designed likelihood function works for the estimation of the phase biases which affect the ratio values in ambiguity fixing.

The following procedure is implemented to calculate *ratio xi k* at epoch *k* for each element in sample set *xi k N i*=1 : (a) *x<sup>i</sup> <sup>k</sup>* **is** used as known bias values to calibrate the measurement model by Equation (4) and solve the equation set to get float SD ambiguities and the corresponding VC matrix; (b) the DD ambiguities and the VC matrix are calculated, and the integer ambiguities are fixed using the LAMBDA method; (c) *ratio xi k* is calculated using Equation (7).

Moreover, the phase bias can be regarded as constant between epochs, and the transition function which transports samples from epoch *k* − 1 to epoch *k* is

$$\mathbf{x}\_{k}^{j} = \mathbf{x}\_{k-1}^{j} + \boldsymbol{v}^{j},\tag{22}$$

where *v* is the normal distributed noise with each element *v* ∼ *N*(0, σ).

The flowchart of the SQMC procedure for phase bias estimation is plotted in Figure 2, and the corresponding algorithm is presented as Algorithm 3.

**Figure 2.** Flowchart of the global navigation satellite system (GNSS) phase bias estimation. The blue boxes refer to the GNSS data processing steps and the yellow boxes are related to SQMC.


Algorithm 3 combines the QMC method and the GNSS ambiguity fixing procedures together to estimate the GNSS phase bias. The low-discrepancy sequences of QMC are included for variance reduction. Section 5 shows the applications of the approach with practical GNSS data.

#### **5. Experiments with Practical GNSS Data**

The GLONASS phase IFB estimation of baseline GOP6\_GOP7 in networks of international GNSS service (IGS) (ftp://ftp.cddis.eosdis.nasa.gov/pub/gnss) was taken as an example to demonstrate the variance reduction by QMC. The baseline was in Europe with the location in Figure 3, and the two GNSS stations were equipped with LEICA GRX1200+GNSS 9.20 and TRIMBLE NETR9 5.01 receivers, respectively. The measurement data were collected at GPS time (GPST) 9:00–10:00 a.m. on day of year (DOY) 180 of 2018 with an epoch interval of 30 seconds. Six GLONASS satellites were observed during the time span, and the satellite slot numbers are shown at the beginning of the satellite trajectories in Figure 4. The baseline had a post-processed GLONASS phase IFB around −29.5 mm/frequency number (FN) which can be regarded as the true values of both L1 and L2 frequencies.

**Figure 3.** Location of the GNSS baseline GOP6\_GOP7 in Europe.

**Figure 4.** Observed Russia's Global Navigation Satellite System (GLONASS) satellites shared by the two stations of baseline GOP6\_GOP7 at GPST 9:00–10:00 a.m. on day of year (DOY) 180 of 2018. The satellite slot numbers in blue color are marked at the beginning of the satellite trajectories.

Both the code and carrier phase measurements of frequency L1 and L2 were used to form Equation (1) in the experiment. Only one IFB parameter was included because the IFB values for both frequency L1 and L2 were regarded as the same. Algorithm 3 for SQMC-based phase-bias estimation was implemented, and the IFB estimate was derived at each epoch. Furthermore, the IFB estimates were also calculated using the SMC-based approach for comparison. When the bias was estimated many times, the estimates of each time were different because the RQMC sequence was used in the SQMC-based procedure, and the pseudo-random sequence was used in the SMC-based procedure. The standard deviation (SD) of the estimates was calculated to evaluate the performance.

Firstly, the IFB was estimated 1000 times with the SMC-based approach. The sigma of the transition noise in Equation (22) was set to 1 mm/FN, and the sample number was fixed to a value of 100. The 1000 estimates of IFB for the first 60 epochs are drawn in Figure 5 as yellow lines. Afterward, the SQMC-based approach with a Sobol sequence for IFB estimation was implemented. The SQMC strategy also had a sigma of the transition noise as 1 mm/FN and the number of samples as 100. The IFB was also estimated 1000 times, and the results for the first 60 epochs are plotted in Figure 5 as blue lines. The SDs of the 1000 estimates of SMC and SQMC approaches are calculated and presented in Figure 5 as a yellow line and blue line, respectively.

**Figure 5.** Inter-frequency bias (IFB) estimates with 1000 iterations and their SDs based on SQMC-based and SMC-based algorithms with data of baseline GOP6\_GOP7 collected at GPST 9:00–9:30 a.m. on DOY 180 of 2018.

Obviously, both approaches could successfully estimate the IFB values. After convergence, the estimates of the two algorithms were very similar, such as the results after epoch 40th. However, the estimates of the SQMC-based algorithm converged faster than those of the SMC approach, and the corresponding SD was much lower at the beginning. In the worst case of the 1000 estimates, the results converged, i.e., became close to the true value with a difference smaller than 1.5 mm/FN at the seventh epoch for SQMC and at the 40th epoch for SMC.

The variance of the estimates for the SQMC-based algorithm with variation in the sample numbers and the transition noise was also analyzed.

Firstly, the phase IFB was estimated 100 times using the SMC-based and SQMC-based algorithms, separately, with the number of particles varied from 30 to 200. The sigma of transition noise was fixed to 1 mm/FN for each estimation. The SDs of the 100 estimates at epochs 1, 2, 5, and 10 are presented in Figure 6, where we can see that the SD decreased for both SMC and SQMC as the number of particles increased. The SD for SQMC had much smaller values compared with SQMC. This indicates that the SQMC-based algorithm could achieve estimates with smaller SD than the SMC-based algorithm using even smaller sample numbers. This is very meaningful in GNSS data processing, because the main time-consuming step is the ambiguity fixing procedure in step 2 in Section 2 for each sample. Fewer samples result in a lighter computation load.

**Figure 6.** SDs of the 100 estimates based on SQMC-based and SMC-based algorithms, separately, for different number of particles from 30 to 200 at epochs 1, 2, 5, and 10. The data are for baseline GOP6\_GOP7 collected at GPST 9:00–9:30 a.m. on DOY 180 of 2018.

Then, the effects of the transition noise were evaluated. The phase IFB was estimated another 100 times with the sigma of the transition noise from 10−<sup>6</sup> to 10−<sup>2</sup> m/FN, and the number of the samples was fixed to 100. The SDs of the 100 estimates were calculated at each epoch and the values at epochs 1, 2, 5, and 10 are plotted in Figure 7. Obviously, the SDs of the SQMC-based algorithm were much smaller than those of the SMC-based algorithm at all the four epochs. The SDs at epoch 10 showed a curve near 10−<sup>3</sup> m/FN and were larger than the STDs corresponding to other nearby sigma values. This indicates that the transition noise level in the transition model needs to be set carefully. Too high a transition noise will increase the SDs; however, if the sigma is too small, the samples cannot evolve to the proper field and the prior density cannot be well represented by the samples, also leading to large STD values.

**Figure 7.** SD of the 100 estimates based on SQMC-based and SMC-based algorithms, separately, for different transition noise with sigma from 10−<sup>6</sup> to 10−<sup>2</sup> m/FN at epochs 1, 2, 5, and 10. The data are for baseline GOP6\_GOP7 collected at GPST 9:00–9:30 a.m. on DOY 180 of 2018.

#### **6. Conclusions**

This article presented the problem of solving the mathematical models in GNSS phase bias estimation, which is essential for fast and precise GNSS data processing; it then developed a fast and efficient algorithm by combining the GNSS data processing procedure and the SQMC method.

The proposed SQMC-based GNSS phase bias estimation algorithm introduces the QMC technique to the SMC-based algorithm for variance reduction. The random sequences in SMC approach are replaced by low-discrepancy sequences in the sampling of the steps of initialization and prediction. We performed the experiments of phase IFB estimation for GLONASS data processing with practical data. The results show that the GNSS phase bias estimates have much smaller variance based on the low-discrepancy sequence. The estimation converges faster, and the results are more precise even with a much smaller number of samples. This can largely save the convergence time and the computation load of GNSS PNT services.

In addition, this study introduces the difficulty in GNSS data processing to the mathematic community, and it has the potential to boost new numerical algorithms for GNSS research and applications.

**Author Contributions:** Conceptualization, Y.T., M.G., and F.N.; methodology, Y.T., M.G., and F.N.; writing—original draft preparation, Y.T.; writing—review and editing, M.G. and F.N. All authors read and agreed to the published version of the manuscript.

**Funding:** This research was funded by the National Natural Science Foundation of China (No. 41804022) and the Fundamental Research Funds for the Central Universities (No. 2682018CX33).

**Acknowledgments:** The authors thank IGS MGEX for providing GNSS data.

**Conflicts of Interest:** The authors declare no conflicts of interest.

#### **Abbreviations**

The following abbreviations are used in this manuscript:


#### **References**


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

*Article*
