1. Introduction
The critical zone and Earth’s crust are constantly monitored across ecological and geological disciplines due to their importance to terrestrial life, for industrial applications and for advancing science. Remote sensing methods are commonly deployed, in which physical energy fields are recorded on or above the Earth’s surface, and are used to infer structure and properties of the subsurface. In particular, seismic interferometry is a powerful technique which transforms previously discarded data, such as seismic energy from earthquakes or from the ambient background noise field, into useful signals that remotely illuminate subsurface Earth structures [
1,
2,
3]. The origin of seismic interferometry can be traced back to the seminal work of Claerbout [
4] who showed that the reflection response of a horizontally layered medium could be estimated from the autocorrelation of its transmission response. Seismic, or more generally wavefield interferometry has since become a rapidly evolving field of research [
5,
6,
7,
8], leading to fundamental advances in our ability to image the Earth’s crust at global [
9,
10], regional [
11,
12,
13] and industrially relevant scales [
14,
15,
16].
In general, interferometric methods rely on cross-correlating, convolving or deconvolving pairs of recorded signals to extract information about the medium. The goal is to estimate signals which would have been acquired if the receivers [
17,
18] or sources [
1,
2,
19] been deployed at different locations or at different times [
20,
21,
22,
23]. Theoretically, the response of a medium at a given location to an impulsive source at a different location is described by Green’s functions which are associated with specific equations describing the wave dynamics. The purpose of seismic interferometry is usually to estimate various approximations to these Green’s functions.
Two theoretical assumptions that underpin many interferometric methods are that recorded energy comes from sources that are distributed isotropically in space around the receivers, and that the time series emitted by the sources are statistically uncorrelated in time between pairs of sources. The latter assumption precludes sources that are spatio-temporally correlated. However, it is often necessary or even desirable to rely on noise sources that are already present in the environment. This is especially true in areas of the Earth where environmental constraints preclude the use of active artificial seismic sources, or when we wish to illuminate large volumes of the Earth that require more powerful sources than can be generated artificially. Sources of freely available ambient noise abound in the Earth and interferometry has been successfully performed by cross-correlating earthquake codas (the long tail of energy that is recorded after the initial impulses from first-arriving seismic waves) which are assumed to act as an approximately diffuse, reverberating wavefield approaching receivers from all directions [
1], or using recorded wavefields assumed to come from isotropic noise fields [
2,
3,
24,
25]. Many known physical noise sources that contribute to ambient noise are in motion, e.g., storm sources, ocean waves or wind [
26], in addition to seismic energy deriving from man-made activities such as shipping [
27] or noise from traffic. Recently it has been recognized that highway and/or railway traffic can comprise a dominant component of the ambient noise field, and interferometry has been applied to highway traffic noise [
28,
29,
30], railway noise [
31,
32,
33,
34,
35], and noise generated from waves breaking along coastlines [
36], as illustrated in
Figure 1. However, all these sources of ambient noise are inherently correlated in time, which is in direct contradiction to the theoretical assumptions outlined above. Although different methodologies for mitigating non-ideal source distributions have been considered, for example in [
37,
38,
39,
40], the effects of statistical correlation in ambient noise sources were not considered, particularly in the important case of correlation induced by motion of the source. The impact that source motion has on cross-correlational interferometry was investigated in [
41] but under the standard assumption of statistically uncorrelated ambient noise. Importantly, the extent of the error in the interferometric estimates of the Green’s function due to the assumption of uncorrelated noise sources has so far not been quantified, and no general methods to reduce these errors have been published.
In this paper, we present a unified theory of inter-receiver seismic interferometry that encompasses correlated and uncorrelated noise sources in both the near and far fields, including the case of moving sources. Our theoretical framework allows us to develop a novel workflow that mitigates the spurious effects arising from the use of correlated noise sources, leading to the retrieval of the Green’s function from a short-time recording of a single correlated moving noise source. Moreover, our framework provides a systematic insight into the mechanisms through which the correlation in the sources induces errors in the estimates of the Green’s function and its phase. We quantify these errors and show that they have a significant effect for commonly used noise sources, with spurious effects that obscure the estimates of phase (and hence of wave travel times). Our novel method is based on an appropriate randomization of the recorded traces which allows for an accurate interferometric retrieval even from a single moving energy source. Furthermore, our analysis explains why stacking multiple traces may reduce errors due to noise correlation in the interferometric estimates, but also highlights limitations of this approach, and identifies potentially commonly occurring circumstances in which it will fail. Our unified methodology is applicable to both correlated and uncorrelated ambient noise, and is particularly useful in cases where stacking does not improve the signal to noise ratio in the retrieval.
In what follows we begin by briefly presenting the theoretical principles of standard seismic interferometry for the case when individual recordings from static impulsive sources are available, as well as for an ambient noise scenario. Subsequently, we highlight the challenges in retrieving the interferometric inter-receiver Green’s function from ambient noise. The important case of performing interferometric retrieval from wavefields generated by correlated moving sources is considered next, and the corrupting effect due to the Doppler spread on the quality of the interferometric retrieval is discussed. We then review the standard ensemble average approach in ambient noise interferometry, which is referred to as stacking. Potential pitfalls of the retrieval through the stacking average are discussed in terms of the correlation structure of the source signature. Moreover, an analytical decomposition of the ambient noise retrieval is derived which, based on the characteristic correlation length of the source signature, separates the retrieval into a term comprising predominantly coherent contributions that correctly approximate the inter-receiver phase, and the crosstalk term associated with spurious incoherent contributions. These analytical findings motivate the development of a novel and versatile technique, referred to as the random windowing approach. In contrast to the standard stacking average, the random windowing average does not rely on spatial localization of the correlation structure of the source signature, and it can be used for interferometric phase retrieval from short recordings of a correlated noise signal. We conclude by outlining a pseudo-algorithm for carrying out the random windowing retrieval and present an example with the resulting phases retrieved successfully from both correlated and uncorrelated ambient noise.
2. Theoretical Background
It is a well-established principle that cross-correlating seismic observations recorded at different receiver locations yields seismic responses that under certain conditions, provide useful estimates of the inter-receiver Green’s function of linear wave equations. This principle has been expressed in a variety of ways, often via the acoustic or elastic representation theorems, and using either the convolution, deconvolution or cross-correlation of recorded seismic traces. For our purposes, a cross-correlational Green’s function representation theorem is used in line with [
2,
42], in a regime of volume-injection rate impulsive sources.
Consider a medium of density
and sound speed
c, and let
denote the frequency-domain response of the medium to an impulsive source located at
and recorded at location
. Throughout,
denotes quantities in the frequency domain, and
denotes complex conjugation. Given a pair of receiver locations
and
we assume an empirical estimate
of the inter-receiver Green’s function
can be represented as
as in Wapenaar and Fokkema [
2], where each
is an impulsive source location along the boundary
S (see leftmost panel in
Figure 1). The integrand in Equation (
1) is equivalent to the convolution of the time-reversed trace recorded at
with the trace recorded at
, and this operation corresponds to a cross-correlation in the time domain (this statement holds exactly for recordings
defined for all time). Please note that
in Equation (
1) is proportional to the spatial average of the integrand in the right-hand side of Equation (
1) over
S; this interpretation will be helpful below.
The main theoretical requirements for Equation (
1) to provide a good estimate of the inter-receiver Green’s function are: (i) that the medium is lossless, (ii) that the source boundary
S encloses the receivers and that energy is emitted equally from all directions, (iii) that
S is sufficiently far from the receivers for the recorded energy flux to be emitted approximately perpendicular to the boundary, and (iv) that individual recordings are acquired independently for each impulsive source location
within
S. Under these idealized conditions, the empirical estimate
provides a good approximation to the homogeneous inter-receiver Green’s function [
2], i.e.,
−
or, equivalently
−
in the time domain. Wapenaar et al. [
24] discuss spurious effects that are introduced when the above theoretical assumptions are violated. However, in practice artefacts due to the violation of these theoretical assumptions are usually assumed to be sufficiently weak to allow for retrieval of useful empirical estimates of the Green’s function. In particular, seismic interferometry has been shown empirically to be applicable in regimes where the medium illumination is one-sided [
24,
43,
44,
45]. Hence, while the boundary
S in Equation (
1) is typically assumed to be a closed contour which is sufficiently distant from the receivers, we assume
S to be a straight source-line as depicted in the left insets of
Figure 1. It is worth noting that while the amplitudes of empirical Green’s function estimates are acknowledged to be unreliable, the phases of the Green’s function recovered through such a procedure are assumed to be well approximated (see for example, [
2,
46], and remarks under Equation (
9) below).
The main practical obstacle in the context of ambient noise interferometry stems from the fact that the medium’s response to impulsive sources is not readily available. Hence, the information required for constructing the integrand in Equation (
1) cannot be obtained directly. Instead, Wapenaar and Fokkema [
2] propose the ambient noise interferometric relationship defined as
where
denotes ambient noise recorded at a receiver location
for frequency
. It is useful (and theoretically appropriate) to consider each recording
as a random field which is obtained via the Fourier transform of a realization of a stochastic process (a random mechanism) generating
(these statements hold for
, we do not delve into the accuracy of finite-time approximations of the Fourier transform). In such a framework source locations
are no longer required to be known explicitly. Instead, the source characteristics are implicitly present in the ambient noise representation in Equation (
2) and they are accounted for, in principle, in the ambient noise recordings represented via
where
characterizes the unknown
source signature. Wapenaar and Fokkema [
2] showed that provided that
and
are obtained from recordings of uncorrelated noise sources, taking the empirical average over a large number (
) of realizations of
in Equation (
2), referred to hereafter as
stacking, leads to a good approximation (such an approximation is increasingly accurate for
provided that
is ergodic which seems to be a reasonable assumption in practice). of the phase of the inter-receiver Green’s function
. However, as discussed in the subsequent sections, the stacking average is likely to produce unsatisfactory results in the presence of correlated noise sources even when long-time recordings are available.
In this paper, we delve deeper into the relationship between the retrievals
and
in order to identify conditions under which the ambient noise retrieval based on Equation (
2) provides a satisfactory approximation of the retrieval Equation (
1), and to develop techniques that allow us to achieve this for both uncorrelated and correlated noise sources.
To understand the need for the (statistical) averaging over multiple realizations of the ambient noise recordings in the retrieval (
2) for some fixed boundary
S, substitute the integral expression (
3) for the noise recordings at receivers
and
into Equation (
2) to obtain
Please note that the above formula represents a product of two independent integrals which are re-written in a form suitable for the subsequent analysis; to simplify notation, we skip the explicit dependence of
in Equation (
4) on the particular recording of ambient noise, but this remains implicit. The geometry corresponding to the retrieval based on, respectively, Equations (
1) and (
4) is sketched on the left side of
Figure 1. In contrast to
in Equation (
1), where averaging over the source locations
leads to the cancellation of all phases except for the inter-receiver phase, the ambient noise retrieval
in Equation (
4) does not represent a spatial average over
S. Consequently, the retrieval in Equation (
4) is dominated by acausal and spurious information contained in the integrand for
(see middle panel of
Figure 1).
It turns out that a reliable phase retrieval of the inter-receiver Green’s function
from Equation (
4) relies on an appropriate statistical average which couples the source characteristics
and
for all
to generate an integral kernel which is localized in the neighborhood of
. As discussed in subsequent sections, the suitable statistical average depends on the nature of the noise. In particular, for correlated noise sources the (stacking) average over multiple realizations of Equation (
2) is largely unsatisfactory regardless of the amount of available data, and a more versatile averaging approach must be used. Please note that while a constant medium density and propagation speed have been assumed, the analysis presented still holds when these two quantities are space-dependent.
In light of the above discussion, it is clear that the non-averaged relationship in Equation (
4) is dominated by contributions from incongruous source locations, and these may lead to spurious contributions that completely obscure the phase of the inter-receiver Green’s function. Consider, for example, the interferometric retrieval from a simulated sequence of sources which generate seismic impulses in a time-ordered fashion; hereafter, we refer to such a sequence as a
train of sources. The train of sources is not to be confused with a
set of impulsive sources recorded one-by-one and used to calculate the retrieval Equation (
1); these two cases will be compared in what follows. In both cases we assume that the signal is recorded by a pair of receivers deployed in the vicinity of source-line
S. To simulate the wavefield generated by trains of sources, we derived an analytical two-dimensional model which is described in
Appendix A.
Figure 2 illustrates the situation where the interferometric trace retrieved via Equation (
4) from a train of sources differs significantly from the retrieval obtained from individually recorded impulsive sources, according to Equation (
1). Both retrievals are based on sources on the same boundary
S and recorded over the same time interval, except that the impulses in the train of sources are generated in a time-ordered sequence of physical locations which span
S.
When the sources are individually recorded (bottom of
Figure 2) the inter-receiver Green’s function is retrieved accurately with the main arrival found around 2 s, which corresponds to the inter-receiver travel time. In the top panels, the inter-receiver arrival is completely obscured by spurious arrivals, and the phase does not reproduce the reference estimate in the bottom panels.
In addition to the above issues, if the noise source is in motion, one must take into account both the spatiotemporal correlations of the noise sources, as well as the Doppler effect which shifts the energy emitted by the source at each frequency to a range of other recorded frequencies, even if the source is monochromatic. The instantaneous frequency
of the train of sources recorded at location
and time
can be derived analytically from Equation (
3) when it is used to represent a continuously moving source, resulting in the following expression (see the derivation of Equation (
A23) in
Appendix A)
where
is the frequency emitted by the source, and
is the angle between the line from the instantaneous source location at time
to the receiver location
, and the source-line
S. Equation (
5) agrees with the well-known expressions for the Doppler shift for the instantaneous frequency when the source is moving directly towards, or away from, the receiver. However, it is more general in that it quantifies the Doppler spread in the recorded frequency in terms of the relative source-receiver position angle
, when the source has an azimuthal velocity component. It is clear from Equation (
5) that for a pair of distinct receivers
and
, the corresponding relative position angles of the source
and
will be different; this means that the same source emitting the single frequency
will be recorded with a different frequency at each location.
Figure 3 illustrates the relationship between the instantaneous location of a moving noise source and the recorded frequency
at each receiver
and
. As expected, the recorded frequency is higher when the source is moving towards the receivers, and lower as it moves away. Clearly, as can be deduced from Equation (
5) and observed in
Figure 3, the only (instantaneous) location of the source that will lead to recording the true emitted frequency occurs when the source is collinear with the pair of receivers, and the line through
and
is perpendicular to the boundary
S; depending on the geometry of the problem, such a location need not exist. Thus, in general, the frequency emitted by a moving noise source and recorded at the receivers will differ from the true frequency
, and the frequency bias will be different at
and
at all instantaneous locations except when the source is collinear with the receiver pair.
This has important implications for seismic interferometry, since recordings at any particular frequency will usually have been emitted at different source locations. For example, in
Figure 3 the spectrum value recorded at 24.5 Hz would have been emitted by a source that was 800 m along the boundary for receiver
, but only 400 m for receiver
. Given that interferometry is performed frequency by frequency, this inconsistency in the recorded frequency is conceptually equivalent to introducing further spurious contributions from spatially incongruous source locations. Moreover, in the case of a more realistic broadband source the spread and consequent interaction among different frequencies becomes compounded, with multiple source locations along the boundary
S introducing spatially incongruous contributions for every recorded frequency. Hence, the Doppler shift associated with moving noise sources further compounds the spurious effects inherent to ambient noise interferometry if we use the retrieval (
2).
Equation (
5) shows that the Doppler shift is amplified for higher frequencies, as well as increasing source train speeds. In addition, larger inter-receiver distances, as well as near-field recordings, will increase the discrepancies in frequencies recorded
and
, resulting in further deterioration of the interferometric retrieval. Some of the potential obstacles for retrieving inter-receiver Green’s functions from sources in motion have been considered theoretically by Sabra in [
41], where the inconsistency induced by the differential Doppler effect at the receivers is mitigated via a stationary phase approximation, consistent with the discussion above, but under the assumption that the ambient noise sources are statistically uncorrelated in time. More recent work by Liu et al. [
35], applies standard ambient noise interferometry (
2) to an expression for the wavefield generated by a train, and the potential retrievable wave types are discussed using intuitive ray-path arguments which are then tested on train recordings. However, the explicit role of the source speed in the retrieval, and systematic methods for mitigating these effects have not been addressed (and the Doppler effect is not considered in the latter study).
In summary, the crosstalk terms associated with the integrand in Equation (
2) for
emerge naturally in systematic ambient noise interferometry analysis but their effects are commonly removed from considerations by imposing the assumption that the ambient noise field is uncorrelated; such an assumption greatly simplifies the theoretical retrieval procedure but it is often unrealistic. This is the case, for example, when performing traffic or railway noise interferometry. Moreover, ambient noise interferometry (
2) using moving point sources must contend with the Doppler effect in addition to the unavoidable correlations between the noise sources.
We discuss consequences of these issues in detail in the following section, where we point out that the delta-correlated assumption is unnecessarily restrictive, and that excluding near-diagonal terms reduces uncertainty in the retrieved amplitudes. We show in detail how the success of the standard stacking procedure depends on the properties of the source signature and on the source speed in cases where the noise sources are in motion. Moreover, we show how this standard stacking technique may fail, and propose an alternative workflow in
Section 4 which proves to be successful at isolating terms from the incongruous crosstalk terms, leading to the extraction of the underlying inter-receiver phase regardless of the presence of correlation in the ambient noise field. Finally, for the scenario where the presence of correlation is caused by motion of the source, the workflow presented implicitly mitigates spurious contributions due the Doppler spread as explained above.
3. When and Why Stacking Works
A commonly used averaging procedure in ambient noise interferometry, referred to as stacking, relies on the arithmetic average over several interferometric retrievals
in Equation (
2)—e.g., [
2]. Here, we define
stacking more formally as the operation consisting of taking the statistical average of functions evaluated on an ensemble of recordings of the ambient noise field over some time interval; we denote such an ensemble average by
. Below, we formally link the interferometric retrieval (
2) to the retrieval (
1) and determine conditions leading to the likely failure of the stacking average retrieval when the ambient noise field is correlated.
Application of the stacking average operator to the ambient noise retrieval represented via Equation (
4) leads to
where
is referred to as the
kernel of the stacking average or simply as the
stacking average kernel. This is determined by the correlation structure of the noise source characteristics
. Recall that for brevity of notation we consistently skip the explicit dependence of
in Equation (
6) on the particular recording of ambient noise, and that for any fixed
and
, the source signature
should be thought of as a random variable, since it is associated with the Fourier transform of the ambient noise recording
. The stacking average
is linear by construction and we assume that the order of integration is irrelevant (Fubini’s theorem is satisfied). Please note that the averaging in
effectively couples the two integrals which contribute to the single-recording ambient noise retrieval
in Equation (
4). The formula in Equation (
6) formally highlights the link between
in Equation (
1), the stacked ambient noise retrieval
, and the properties of the stacking average kernel
. Clearly, the properties of the kernel of the stacking average control the discrepancy between
and
, which one is interested in minimizing.
In particular, the common interferometric assumption that the ambient noise field is uncorrelated leads to the stacking average kernel in Equation (
6) of the form
where
is the power spectrum of the ambient noise source. There is a further implicit assumption in Equation (
8) that the power spectrum of the source does not depend on the spatial variables. Clearly, if Equation (
8) holds, evaluating the integrals in Equation (
6) leads to
where
represents the retrieval from individually recorded discrete sources defined by Equation (
1). The result in Equation (
9) implies that even in the idealized scenario when Equation (
8) holds, ambient noise interferometry does not generally allow for accurate retrieval of the amplitude of the Green’s function due to the presence of the noise power spectrum
, and the lack of individual source recordings which could allow for a source-specific spectrum correction. On the other hand, the estimated phase remains unaffected by the presence of the real factor
in Equation (
9). Note also that in practice, one has to consider empirical averages in Equations (
6)–(
8) based on a finite number of ambient noise recordings instead of the abstract stacking in terms of the statistical average over an ‘infinite’ number of recordings implied by the operator
. In such a case, if the relationship in Equation (
8) holds in the limit of averaging over several samples tending to infinity, it can be shown that the acausal contributions from off-diagonal, incongruent source locations (i.e., when
) decay with the number of samples and Equation (
9) holds asymptotically; we postpone these more technical details to another publication. A related derivation of a bound on the signal to noise ratio can be found in Appendix B of [
47].
The requirement that ambient noise be delta-correlated in space is restrictive and it is often unrealistic to assume such an approximation in applications. As pointed out in [
48], most sources in practice are, unsurprisingly, neither fully correlated nor fully uncorrelated. Although some advances have been made in the context of simultaneously acting sources [
40,
49] based on deconvolution techniques to tackle the resulting underdetermined systems, further investigation of such a deconvolution setup is required. Nevertheless, the relationship in Equations (
8) and (
9) is desirable, as it allows for the retrieval of correct phases of the associated Green’s functions from Equation (
6). Therefore, in what follows we consider approaches which resemble the relationship in Equation (
8) but without the requirement for the spatially uncorrelated noise sources. The remainder of this section is devoted to the analysis of the stacking average through the properties of the stacking average kernel
in Equation (
7). A new approach, which exploits a different averaging kernel and applies to both correlated and uncorrelated noise sources, is proposed and analyzed in
Section 4.
To gain a theoretical insight into the ambient noise retrieval, we decompose Equation (
4) as
where
and
and
The term
, referred to hereafter as the
ϵ-diagonal term, accounts for the contributions from sources such that
, and one might consider sufficiently small
, in order to minimize the contribution from acausal arrivals from incongruous source locations in
. The decomposition in Equation (
10) holds for any choice of
but taking
does not necessarily remove all acausal arrivals from
for moving noise sources, as explained in the discussion in
Section 2. However, the aim of the above decomposition is to find an appropriate choice of the
spatial correlation length cut-off so that
is dominated by the coherent terms, whereas
is dominated by the acausal contributions due to incongruous source locations. Please note that the decomposition in Equation (
10) could be applied to the integral with respect to
instead of
without loss of generality.
Finally, we consider the stacking average of the decomposition in Equation (
10) given by
with the aim of linking the characteristics of the stacking average kernel
to the spatial correlation length cut-off
in the above decomposition. Please note that in this section no specific assumptions have been made about the source or its spectrum, as the representation (
3) used to arrive at Equations (
10) and (
13) is general.
Interferometric Retrieval from Spatially Correlated Noise Due to a Moving Source
To show the utility of the
-diagonal decomposition (
10) in interferometric analysis, we consider a specific but common example of a broadband source train travelling at speed
and emitting sound within the frequency band
, such that the phases for different emitted frequencies are random and independent from each other (we address correlation of the phase across different frequencies below), i.e., for each frequency
emitted by the source, there is a corresponding random phase shift
uniformly distributed between 0 and
, and we assume that the phase shifts
and
are independent whenever
. In such a case we find (see Equation (
A34) for details) that the stacking average kernel in Equation (
7) is given by
which depends on the train speed
, the mean emitted frequency
and the bandwidth
, and where
. The model and the resulting stacking average kernel (
14) implicitly take account of the Doppler effect (see model derivations in
Appendix A). Expression (
14) for the kernel holds for non-monochromatic sources travelling at speeds
, and for sources whose spectrum can be represented in the form Equation (
A27). In the limit of
, i.e., when the source is monochromatic with frequency
, this kernel reduces to
which is no longer localized. See remarks under Equation (
A24) in
Appendix A concerning a more general model that allows for appropriate localization. For simplicity, we consider the parametrization of the straight boundary
S given by
, where
corresponds to the location where the line through the receiver pair intersects the boundary
S, so that
and similarly
. Using this parametrization we have that
in Equation (
14). For further details see Equation (
A35) and accompanying remarks in
Appendix B.
The movement of the noise source allows us to parameterize any locations
via the travel time so that
and
. The analytical expression for the stacking average kernel associated with a broadband correlated train of sources allows one to identify the approximate correlation length beyond which the envelope of the kernel (
14) decays rapidly. The unwanted long-range correlations contributing to the kernel away from the neighborhood of
,
, represent the main contribution to
in Equation (
13) from the acausal arrivals. Based on Equation (
14), we set the spatial correlation length cut-off
to twice the value of the first zero of the envelope of the stacking average kernel (see the bottom panel of
Figure 4), which is given by
for non-monochromatic sources travelling at speeds
.
Figure 4 shows the numerically approximated stacking average kernel
and its exact form (
14). The rate of decay of the source kernel in terms of the spatial discrepancy
is controlled by the envelope of the
factor in Equation (
14). As noted before, the spatial discrepancy can be represented in terms of the travel times so that
, where
is the train speed. Although the choice of the spatial correlation cut-off length
has so far been somewhat arbitrary, the main principle in its determination is to use the structure of the stacking average kernel
in order to approximately identify the correlation length beyond which the crosstalk terms dominate the correlation structure represented by
.
Please note that while the decomposition (
10) and its stacking average Equation (
13) are given in the frequency domain to highlight the interaction between incongruous source locations, the corresponding stacking average decomposition in the time domain is given by
and its stacking average is given by
The above time-domain decompositions hold due to the linearity of the inverse Fourier transform; as is standard in the interferometric literature, we assume throughout that h is square-integrable in (so that the Fourier transform is well-defined on ) and that and are such that Fubini’s theorem holds throughout. Thus, the diagonal band decompositions of h and are dual to each other and they can be considered interchangeably.
The formulation of the
-diagonal decomposition in the time domain (i.e., representation (
16)) lends itself directly to the analysis based on time series recordings provided that a suitable spatial correlation length cut-off
can be determined. When the noise source is in motion, as in the case considered here, the spatial correlation length cut-off
has a temporal counterpart
for
. In
Section 4, we will focus on a procedure for recovering the
-diagonal band structure, analogous to that in
in Equation (
17), from a single short recording when the value of the spatial correlation length cut-off
cannot be estimated from the stacking average kernel. First, however, we outline the general properties of the stacking average and its potential pitfalls when retrieving the inter-receiver interferometric Green’s function. To this end we consider two different classes of noise sources, and we compare the standard stacked retrieval
and its stacking average
with retrievals based on isolating the
-diagonal band terms
and
in the decomposition Equations (
16) and (
17), respectively, using an ‘informed’ choice of the spatial correlation length cut-off
given by Equation (
15).
In
Figure 5 we use the spatial cut-off radius
determined in Equation (
15) and apply the
-diagonal decomposition procedure laid out in Equation (
16). First, we assume that individual recordings are statistically independent. In particular, for a noise source with a broadband spectrum this constraint translates to the requirement that the frequencies associated with the random source signature
be independent. To generate this figure, a train of sources travelling at 90 km/h along a straight boundary and emitting sound between 10 Hz and 25 Hz was simulated to generate the ambient noise recordings
and
. The receiver pair was oriented perpendicularly to the source-line, with an inter-receiver distance of 2000 m and the closest receiver 400 m from the source-line. The medium speed was set to 1000 m/s. The ambient noise recordings were transformed to the frequency domain and
was computed according to Equation (
2). Then the
-decomposition was calculated for a single run to generate the insets in the top row. The frequency-domain data were used to extract the phase in the top-right panel of
Figure 5, as well as transformed back to the time domain to generate the time series
,
, and
in the top left panel of the same Figure. This procedure was repeated 1000 times and averaged to calculate
, as well as its
-decomposition with the associated phases (right) and waveforms (left) in the bottom row of this figure.
In this setup, and given the stacking average kernel in Equation (
14), the retrieval of the inter-receiver Green’s function is shown in gray in both the temporal and the spatial domains, with the
-diagonal terms
,
in blue and the crosstalk
,
shown in orange. According to Equation (
15), we chose
m which corresponds to a time window length
s. The resulting spectra were transformed to the temporal domain via inverse Fourier transform to show the effect of the decomposition on the retrieved waveforms; the components of Equation (
16) are shown in the left panels of
Figure 5. The decomposition was applied to a single five-minute recording of a source train (top), and to the averaged waveform resulting from stacking 1000 such recordings (bottom). The phases corresponding to each waveform are shown in the panels to the right in
Figure 5.
This illustrates the improvement in the retrieval using the decomposition (
10) which satisfactorily separates the causal contributions to the inter-receiver phase from the spurious crosstalk.
Please note that the phase of
in the bottom right panel of
Figure 5 (in blue) typifies the phase retrieved from the standard ambient noise interferometry approach which relies on averaging over stacked independent recordings of ambient noise of the form (
2). Although stacking can clearly improve the quality of the retrieved phase of the inter-receiver Green’s function, significant discrepancies remain especially in the higher frequency range of the spectrum due to the increased influence of the Doppler shift even for many stacked recordings. It should be noted that sources travelling at different speeds and overlapping with each other may help improve the rate of convergence of the standard stacking to the correct phase. On the other hand, the phase of the
-diagonal term
shows a much better agreement with the uncorrelated reference estimate even for a single short recording (blue phase in top-right panel of
Figure 5). Application of the stacking average to the
-diagonal term
leads to a further but slight improvement over the standard stacking applied directly to
which is evident in the higher frequency range of the spectrum. This increased uncertainty for higher frequencies is to be expected given the presence of the Doppler shift.
The above experiment indicates that although there is some flexibility in the choice of the spatial correlation length cut-off
, it is possible to carry out the
-diagonal decomposition (
10) in a way which on average, largely separates the desired contributions from coherent sources and the crosstalk between incongruous source locations into two distinct terms
and
respectively. In the case of correlated moving noise sources, a sufficiently small correlation length cut-off
results in attributing a significant proportion of the coherent noise contributions which should be captured by
relative to the ‘crosstalk’ term
; the situation is reversed for a sufficiently large
. If we were to evaluate these integrals explicitly in order to estimate the true phase, the challenge lies in determining an optimal value of the spatial correlation length cut-off
that minimizes the error in the phase retrieval in the absence of knowledge of the spatial correlation structure contained in the stacking average kernel
in Equation (
7).
It is worth noting that a successful phase retrieval of the inter-receiver Green’s function based on the empirical average of stacked recordings fundamentally relies, via the law of large numbers, on the statistical independence of the individual recordings. In particular, for a noise source with a broadband spectrum this constraint translates to the requirement that the frequencies associated with the random source signature
are independent; otherwise, the empirical average of the stacked retrieval
might still converge as
but there are no guarantees that the resulting limit
will lead to the recovery of the true inter-receiver phase.
Figure 6 shows one such example, where the setup and parameters are identical to those in
Figure 5, except that the phases of the frequencies emitted by the sources are no longer uncorrelated. Such a scenario would be likely to occur, for example, due to an interaction between sleepers on a rail track and the wheels of the train [
32]. Specifically, for each frequency
emitted by the source, the random phases associated with each emitted frequency are such that for a pair of emitted frequencies
and
,
, the phase shifts satisfy
, where
r is uniformly distributed between 0 and
, so that the phase shifts
and
are no longer uncorrelated.
The gray trace in the middle panel of
Figure 6 shows the averaged waveform in the time domain obtained by stacking 1000 five-minute long recordings. In contrast to the result of the equivalent procedure shown in
Figure 5 (bottom left), stacking the recordings does not converge (or does not converge sufficiently quickly) to the correct inter-receiver arrival, and the crosstalk term (orange) in the middle panel does not decay. The corresponding phases are shown in the right panel of
Figure 6 with the phase estimated from stationary discrete sources (see Equation (
1)) in dashed black as a reference. However, the phase of the inter-receiver signal is still successfully isolated by the
-diagonal decomposition (
10), as indicated by the blue line, showing a good agreement with the phase of the inter-receiver Green’s function estimated from discrete sources, as per Equation (
1).
Clearly, in most real-world applications it is not possible to gauge the retrieval against some reference benchmark, which leaves the retrieved results and the convergence trends open to subjective interpretation. Moreover, even if enough recorded data are available to allow for stacked averaging but the stacking average kernel
in Equation (
7) is not spatially localized, there is no a priori criteria for estimating the appropriate value for the spatial correlation length cut-off
needed in the decomposition (
10) (compare the example in
Figure 6 with that illustrated in
Figure 4 and
Figure 5). The key point of these considerations is that in principle it remains possible to estimate the correct phase of the underlying Green’s function from a single short recording of a correlated ambient noise source, and that the cause of the noise correlation is not relevant in such considerations.
In summary, in this section we discussed a constructive example of the interferometric retrieval of the inter-receiver Green’s function which was based on an abstract decomposition of the retrieval (
2) as
, as in Equations (
10) or (
16), and based on an ‘informed’ choice of the spatial correlation length cut-off
so that the undesirable crosstalk terms are largely contained in the
term. This construct, driven by analytical insight and illustrated in
Figure 4,
Figure 5 and
Figure 6, suggests that it should be possible to carry out such a decomposition and subsequently obtain the phase estimate of the inter-receiver Green’s function if a suitable correlation cut-off length
can be estimated. Importantly, this approach would allow one to sidestep the classical stacking average approach and avoid its potential pitfalls. In the following section we propose a workflow that facilitates the extraction of the interferometric phase estimate from a single short recording of a passing train (i.e., an instance of a moving correlated noise source). In fact, the choice of the correlation length cut-off
in the example in
Figure 6 was determined via a procedure described in the next section, since in the setup of
Figure 6 the stacking average kernel
was not spatially localized due to implicit correlations between the noise sources.
4. The Random Windowing Technique
Based on the analytical insight outlined in the previous sections, we consider the problem of estimating the phase of the inter-receiver Green’s function from a single short recording of a train of sources, without resorting to the standard averaging over a stacked ensemble of recordings. Nevertheless, the retrieval based on this procedure effectively results in the extraction of the
-diagonal term in the decomposition
introduced in Equation (
10), where
is determined without the explicit knowledge of the stacking average kernel
and the need for it to be localized. Importantly, the technique applies to both correlated and uncorrelated noise sources.
Consider a single time-domain recording of a train of sources at a receiver pair
and
and denote these recordings by, respectively,
and
whose Fourier transform leads to the retrieval
in Equation (
2), representing the interferometric approximation of the inter-receiver Green’s function
. In the previous section the correlation length cut-off
had to be estimated from the stacking average kernel
in Equation (
6), assuming that the stacking average had a localized kernel (e.g.,
Figure 4) which is not always the case for correlated noise sources (e.g.,
Figure 6). We now determine an optimal spatial/temporal window size that minimizes the effect of acausal arrivals in the interferometric retrieval from a single short recording. The approach relies on the use of an appropriate randomization technique, termed
random windowing, in order to introduce incoherence into the acausal part of the phase in the retrieval
in the available recording and mitigate the influence of the crosstalk terms, largely contained in
of the
-diagonal decomposition of
, which hinder the accurate retrieval of the Green’s function
.
4.1. General Setup
First, consider the interferometric retrieval
in Equation (
2) obtained from recorded time series data within an interval/window of duration
and centered at some randomly drawn time
. For a pair of receiver locations
and
restrict the recorded time series
and
to that temporal window (see
Figure 7). The random windowing method introduced below can be applied to recorded time series generated by arbitrary noise sources. However, in what follows we consider correlated noise generated by a source moving with speed
. This setup is relevant in applications, and it also aids understanding of the theoretical underpinning of the random windowing method, since it allows one to link the windowing of the time series to the spatial windowing, as illustrated in
Figure 7. For time series generated by arbitrary noise sources,
does not correspond to a physical velocity, and stands for an abstract parameter that allows for the space time transformation. Next, for a given average source velocity, the window of duration
T is linked to a spatial window which we define for a receiver location
as
and which corresponds to the segment of the (spatial) source boundary traversed by the source moving at speed
from the origin at time zero, and recorded at a receiver location
within the time window of duration
T centered at
. As illustrated in the right panel of
Figure 7, these boundary segments will differ for each receiver, depending on the inter-receiver distance, the distance to the source-line, and the speed of the source. The theoretical spatial location of
for each receiver can be calculated explicitly using Equation (
A11) and scaling by the source speed (or by its estimated mean), as described in
Appendix A. The frequency-domain representation of the time-windowed data at each receiver can be expressed as
where
is given by Equation (
18). Please note that explicit knowledge of the signals integrated along boundary sections is not required in practice, as the windowing procedure is performed directly on the full recorded time series. A source travelling at speed
will span a distance no greater than
during a time window of duration
T, so that for any random time window location
the corresponding spatial location
satisfies
. Restricting the time window size to
T will implicitly restrict the spatial locations contributing to the cross-correlation terms, and it will limit the crosstalk between incongruous source locations. Choosing the set of source locations
such that
allows us to write a recording Equation (
19) in a manner consistent with representation (
3); namely
where
is the indicator function of a set
A; i.e.,
if
and
if
. Then, applying the ambient noise retrieval Equation (
2) to the time-windowed recordings (see
Figure 7) for a window of duration
T centered at
results in the windowed retrieval.
where the second equality is derived by substituting Equation (
20) for each recording, and recasting the resulting product of integrals in a manner consistent with the form of Equation (
4). Please note that as in Equation (
4), the term
is still present in Equation (
21), and one could consider applying the standard stacking to Equation (
21). However, if the resulting averaged/stacked source kernel
is not sufficiently localized (as discussed in
Section 3), there are no guarantees that we will retrieve the correct inter-receiver phase. Instead, we consider a new averaging operator which relies on averaging over randomized window locations
. We denote this operator by
and define it for a scalar function
as
where
is the probability density function associated with the choice of locations
which is parameterized by
T and assumed to be non-zero only on a bounded interval, and the function
is integrable with respect to
. Next, we define the
averaged retrieval for a time window of duration T as
where we assume that Fubini’s theorem is satisfied and the
random windowing averaging kernel is defined as
Please note that given the above formulation and the
-diagonal decomposition (
10), we have
Furthermore, similarly to standard averaging via the stacking of recordings as in Equation (
6), the structure of the random windowing averaging kernel determines the quality of the retrieval of the interferometric phase estimate from
. Note, in particular, that the random windowing averaging kernel
is a symmetric function of the locations
, and is concentrated in the neighborhood of
. Note also that the source speed
affects the properties of the random windowing averaging kernel. As suggested by the results illustrated in
Figure 7 and confirmed by analytical estimates for a typical setup in
Section 4.2, the random windowing procedure has an analogous effect to the
-localization procedure in decomposition (
13) in the sense that interactions between incongruous source locations are implicitly restricted to within the time/space window of the recordings before cross-correlating. Hence, the contribution of the acausal crosstalk is mitigated. Moreover, if sufficient data are available, one can additionally apply the standard stacking operation to
which yields the general retrieval
since the stacking
and the window randomization
are independent by design, and therefore they commute. Importantly, the form of (
26) implies that the random window average and the stacking average play, in principle, a similar role in the elimination of the crosstalk contributions in the ambient noise retrieval, in the spirit of the
-diagonal decomposition (
10). For
the random windowing kernel
will concentrate the integrand in (
26) to within the
neighborhood of
. A similar effect is achieved if enough data are available, and the noise sources are such that the stacking average kernel
localizes around
(see
Section 3 and
Figure 4). In principle, both operations can be applied concurrently. However, in contrast to the stacking average, the use of random windowing averaging relies on a single recording provided that a suitable value
T of the time window can be determined; derivation of such a procedure described in
Section 4.3 and
Section 4.4 is preceded by a simple example which is aimed at elucidating the main properties of the random windowing average.
4.2. Example: Uniformly Distributed Window Locations
Consider a default situation in which the temporal window location
is uniformly distributed within the interval
. We then have
Then, if we define the spatial source location at time
as
, we have
. Applying the operator
with the uniform density
to the product of the indicator functions in Equation (
24) corresponds to the convolution of two boxcar functions of constant height one and width
, parameterized in terms of the source location discrepancy
,
.
A standard calculation shows that applying operator
to the product of indicator functions in Equation (
26) yields a symmetric kernel in the random windowing averaging of the form
which represents the tent map supported on the interval
and centered at
, where
denotes the Heaviside function. An example of Equation (
28) is plotted in the right inset in
Figure 8, where the kernel (
28) is plotted in blue, and its approximation obtained via direct simulations is indicated by the dotted black line.
It is informative to compare the kernel
of the random windowing average in Equation (
28) illustrated in
Figure 8 with the stacking average kernel
in Equation (
7) which is illustrated in
Figure 4. Recall that both these kernels are present in the retrieval (
26) and that they play, in principle, a similar role aimed at reducing the contribution from the off-diagonal, crosstalk terms into the retrieval. One can see that the random windowing procedure is akin to extraction of the
-diagonal term
in the decomposition (
10), in the sense that for any pair of sources such that
the random windowing kernel mutes the corresponding acausal contributions. For uniformly distributed locations of the random windows around
, only the contributions satisfying
are included in the resulting retrieval (
26). Moreover, the analytical expression (
28) implies that the width of the diagonal band induced by the random windowing procedure is directly proportional to the source speed
. This suggests that a progressively shorter time window size
T should be chosen for increasing train speeds to mitigate acausal arrivals associated with the crosstalk terms in the ambient noise retrieval (
10).
4.3. Estimation of the Retrieval for a Given Time Window Size T
In this section, we describe a procedure for the estimation of the random windowing average
with some fixed time window length
. A systematic way of choosing the optimal time window size
for any given single recording by means of estimating the acausal energy present in each averaged retrieval
in Equation (
23) is discussed in
Section 4.4.
As outlined in
Section 4.1 and illustrated in
Section 4.2 the random windowing procedure will lead to an improved phase retrieval through damping the contributions from the cross-correlation terms in Equation (
26) provided that a suitable temporal window length
is identified, so that the random windowing kernel
in Equation (
23) effectively enforces the recovery of the diagonal band term
in the decomposition (
10). In the case of standard stacking the choice of the spatial correlation length cut-off
can be inferred from the form of stacking average kernel
in Equation (
7), assuming that the kernel is sufficiently localized. Clearly, the structure of the stacking average kernel cannot be determined from a single short recording or in situations when the stacking average kernel can be estimated but it is not sufficiently localized (e.g., as in the example presented in
Figure 6). On the other hand, the random windowing average with an appropriately chosen
T can be applied to obtain the retrieval
in the most challenging case where only a single (possibly short) recording of a source train is available. Moreover, as mentioned earlier and expressed via Equation (
26), if enough recorded data are available, one can combine both averaging techniques to recover
. It is clear that the use of such a doubly averaged approach is more versatile than using only standard stacking in the absence of knowledge about the structure of the noise sources, and it will lead to an improved retrieval, as long as at least one of the kernels is sufficiently concentrated around the set
,
.
The first step in the procedure for the empirical approximation of the random windowing average
with some fixed time window length
T is to identify the approximate time in the recorded traces when the source train is collinear with the receiver pair (see for instance [
31] for one example of a method that achieves this using plane-wave beamforming). We chose a configuration in which the line through the receiver pair is perpendicular to the source boundary, as depicted in
Figure 2. In such a geometry the Fresnel zone of the receiver pair is symmetric about the stationary phase point (see [
40]) on the source boundary but this is not essential to the random windowing method. To see this note that when the instantaneous source location is collinear with the receiver pair the signal recorded at the two receivers is generated at
, and the frequency recorded at each receiver, as well as the Doppler bias, are the same (
in Equation (
5)). Thus, in this instantaneous configuration the effect of spurious contributions to the retrieval is minimized. Finally, note that regardless of whether or not the noise source is in motion, the source location contained in the line through the receiver pair coincides with the stationary phase point (on the source boundary) which is always contained in the Fresnel zone of the receiver pair, unless the line through the receivers is parallel to the source boundary.
The time tag at which the source is collinear with the receivers is denoted by
(see left panels of
Figure 7) and the corresponding source location is denoted by
. Please note that while
is a known time tag, explicit knowledge of
is not necessary in practice. However, this notation will be useful when describing our method in the steps below. We note further that
will by construction be in the middle of the Fresnel zone of the receiver pair. Note also that for a given recording of a single source moving in one direction, the time tag
remains fixed throughout the procedure.
To estimate consider the sequence , , and perform the following steps for the given pair of recordings and :
- (i)
Generate a set of random window locations of a fixed duration
T by sampling from the distribution associated with
in Equation (
22), and such that
. There is flexibility in the choice of the probability distribution
of
provided that
and that
is non-zero only on a bounded interval.
- (ii)
For each extract the windowed recordings and of duration T and centered at from and . This procedure is illustrated in the left panels of
Figure 7 and is performed on the recorded time series for each receiver.
- (iii)
Perform interferometry using the windowed data and derive a realization of
according to Equation (
23), using the Fourier transforms of the windowed recordings from (ii).
- (iv)
Repeat Steps (i)–(iii) for a large number N of random window locations and take the arithmetic mean of
in order to approximate the random windowing kernel
given by Equation (
24), and to obtain the ambient noise retrieval
in Equation (
23).
Due to the law of large numbers, one has
and the empirical mean of the sample
converges, in the limit of
, to the expectation of
as in Equation (
22) provided that
has a finite variance (i.e.,
) and the locations
are sampled independently from
. Finally, note that the steps outlined above do not require knowledge of the source speed
, or for the source to be in motion, as long as it is possible to identify a suitable recorded time
corresponding to the emission of energy from a source location collinear with the receiver pair.
4.4. Procedure for Choosing the Optimal Time Window Size in the Random-Window Averaged Retrieval
The procedure described in the previous section can be applied to any time window of size
. However, similarly to the general
-diagonal decomposition defined in Equation (
10), a successful approximation of the phase of the inter-receiver Green’s function depends on the choice of an optimal support of the random windowing kernel
in Equation (
24). The steps outlined below provide a systematic way to choose an optimal time window size
for any given single recording by estimating the acausal energy present in each averaged retrieval
in Equation (
23).
To determine for a given pair of recordings and , consider first a sufficiently large range of test values ; this range will necessarily be limited by the length of the recordings available. Steps to find the optimal window size for any fixed pair of recordings and are:
- (1)
Choose an ordered collection of time window sizes contained in the test range .
- (2)
Estimate the averaged retrieval
for each time window size
using the procedure described in
Section 4.3.
- (3)
Transform each estimate to the time domain to obtain the corresponding waveforms .
- (4)
Order the waveforms
according to the time window size
(see bottom panel of
Figure 9).
- (5)
Use the ordered family of waveforms
to identify the time tag of the
feature of interest, which is normally the first-arriving dominant wave, denoted by
. In
Figure 9,
s and is highlighted in orange.
- (6)
Calculate the waveform energy of each trace
from time
to a time slightly before
. This calculation quantifies the acausal energy present in each of the traces as a function of
. (In
Figure 9 the waveform energy is calculated for each trace between
and
s. Please note that this curve would be different for different recordings.)
- (7)
Compute the spurious waveform energy calculated in Equation (
4) as a function of the time window sizes
(see the top panel of
Figure 9).
- (8)
Minimize the spurious energy with respect to ; the minimizer is denoted by .
- (9)
Use the optimal window size
determined in Equation (
8) for the interferometric retrieval by following the steps described in
Section 4.3. Compute the final/optimal ambient noise retrieval of the inter-receiver Greens function
given by Equation (
23) or its time-domain counterpart
.
Please note that the above procedure can easily be automated. For multiple source train recordings Steps (1)–(8) can be performed for each recording, noting that the optimal window size will, in general, be different for each recording. Then, the standard stacking average can be applied to estimate and thus potentially improve the retrieval even further.
4.5. Results
We now present some examples of the interferometric retrieval obtained via the random windowing method described in
Section 4.3 which is carried out for the optimal time window size
determined according to the procedure described in
Section 4.4. We present results for a receiver pair that is perpendicularly oriented with respect to the source-line, as well as results for a receiver pair that is oriented at 30 degrees with respect to the source-line. The results are of essentially identical quality.
The distribution of the window locations is chosen to be uniform, as in the example of
Section 4.2. The random windowing method was applied to the two configurations discussed in
Section 3. These configurations corresponded to (i) the case where the stacking average kernel
of the ambient noise is localized, as in
Figure 5 (left panels of
Figure 10), and (ii) the challenging scenario when
is not localized and the spurious crosstalk arrivals do not decay (right panels of
Figure 10). The random windowing method leading to
is applied in all cases to a single short recording of a train of sources, and the stacking average is not applied to
unless otherwise stated.
Figure 10 shows phase estimates produced by the random windowing average in blue and the phase estimate based on
in light gray for comparison. Receivers oriented perpendicularly with respect to the source-line were used in the top row, whereas the bottom row of
Figure 10 shows the results for a receiver pair oriented at 30 degrees with respect to the source-line. The phase estimated from individually recorded sources, which allows us to calculate the reference retrieval using Equation (
1), is indicated in dashed black. We also show the phase of the inter-receiver Green’s function retrieved through the standard stacking average over six source train recordings restricted to a one-minute-long window in a similar fashion to [
34]. For illustration,
Figure 11 shows the waveforms corresponding to the phases depicted in the top-right panel of
Figure 10 (i.e., a correlated noise source with correlated phases, and perpendicularly oriented receivers) with normalized amplitudes. As expected, the waveform corresponding to the random windowing estimate (in blue) exhibits the best agreement with the reference waveform (in black dashes). The phases estimated from the random windowing scheme agree well with the reference phase in all cases, showing that our method is effective in mitigating the undesired effects of correlation in the noise sources in ambient noise interferometry, especially where moving sources such as trains or traffic on a highway are concerned.
We note that the workflow outlined in
Section 4.4 and illustrated above is not restricted to ambient noise interferometry from correlated noise. In fact, this methodology can be applied to short recordings of uncorrelated noise to find an optimal window size that minimizes spurious arrivals. In this scenario, the choice of
as described in
Section 4.3 is more flexible as the source no longer exhibits coherent spatiotemporal correlation or a Doppler spread differential, as would be the case for a source in motion. Therefore, the choice of time tag
may be arbitrary and the random windowing method can still be applied as described in
Section 4.4.
Figure 12 shows the phase estimated by the random windowing method when the ambient noise recording is uncorrelated. The random windowing method was applied to a short (5 min) recording of uncorrelated ambient noise, producing satisfactory results. Our approach has the potential to speed up convergence to the inter-receiver phase when applied to uncorrelated ambient noise.
Finally,
Figure 13 shows the mean optimal window sizes
as a function of the distance from the closest receiver to the source-line; in the illustrated case the inter-receiver distance is 400 m, and three different source speeds are considered. Recall that optimal window sizes
are, to some extent, recording-dependent due to the randomness in recorded signals, so their value will vary for different recordings even if the receiver geometry remains fixed; hence, 30 samples for each considered closest receiver distance and velocity were calculated and averaged to produce the curves in
Figure 13; the fluctuations in the curves diminish with the number of samples used. The relationship illustrated in
Figure 13 indicates that for a fixed source speed and a fixed inter-receiver distance, the optimal window size decreases with the distance of the receivers to the source boundary. Moreover, the optimal window size decreases for increasing source speeds, which agrees with physical intuition and with the form of the random windowing average kernel (
28) illustrated in
Figure 8. Furthermore, as can be deduced from Equations (
5) and (
A11), the spatial discrepancy of the implicit source locations (i.e., the difference
) will be smaller for decreasing inter-receiver distances. Consequently, the error due to the presence of the residual crosstalk terms in the retrieval decreases with decreasing inter-receiver distance, while larger inter-receiver distances require smaller values of the time window size
to mitigate this error.
5. Discussion
The ambient noise interferometric retrieval based on the random windowing average (
Section 4.3 and
Section 4.4) is capable of accurately estimating the phase of the inter-receiver Green’s function based on a short pair of recordings of a correlated noise source (
Section 4.5). As pointed out in Equation (
25), the random windowing average allows one to extract the
-diagonal term in decomposition (
10) of the ambient noise retrieval
without the explicit knowledge of the stacking average kernel
which need not be localized. For a given pair of recordings this new technique requires one to pre-process the spurious energy to determine the optimal time window size
; then, the optimal spatial window length can be used in decomposition (
10) to retrieve
with the optimal cut-off length
. This procedure, illustrated in
Figure 9, can be automated in practical applications. Based on the derivations and the discussion in
Section 3 and
Section 4, the optimal value
of the temporal window size decreases with the inter-receiver distance and with the distance between the source-line and the receiver pair, since the Doppler shift and the spatial incongruence become increasingly important. Moreover, the optimal value of the temporal window size decreases for increasing speeds of the moving noise source, as indicated in Equation (
28). These trends are consistent with physical intuition, as faster source speeds have the compound effect of increasing the spatial coverage (and hence the prevalence of the crosstalk contributions to the ambient noise retrieval), and increasing the importance of the Doppler shift for the retrieval.
The random windowing approach can be used in conjunction with the standard stacking average procedure as discussed in
Section 4. This is especially tractable when estimating inter-receiver Green’s functions from a small ensemble of source train recordings, where the random windowing procedure can be applied to each individual train recording before performing standard stacking to optimize the extraction of causal information. In Ref. [
34], the distance from the receiver array to the rail track on which trains provide source energy is of the order of 5–10 km. This large distance combined with the stationary phase zone approach likely mitigates spurious effects arising from the Doppler effect induced by the trains’ motion, since such a receiver array configuration and their localization methodology implicitly restrict the recordings to regions of the source boundary on which the train is roughly collinear with the receiver pairs. In fact, the choice of the optimal time window size in [
34] in the stacked retrieval is consistent with our findings. Incorporating our methodology in such a setup could allow for the retrieval of inter-receiver estimates from near-field recordings (closer to the rail tracks) maintaining the optimality of extraction of coherent information from each recorded trace.
Sabra [
41] concludes that under some circumstances the Doppler effect does not have a significant effect on the interferometric estimate of the inter-receiver Green’s function. However, that work explicitly assumes that the ambient noise field is uncorrelated in time. To achieve this the sources considered were randomly and uniformly distributed in space around the receiver pair. The situation where the ambient noise field is statistically correlated is not considered. Moreover, to mitigate the differential Doppler effect, a stationary phase approximation is performed throughout, and the far-field assumption is made. These approaches are consistent with our findings, as well as with the random windowing method presented in this work.
In interferometric studies from traffic noise, such as the one performed by Behm et al. [
30], wave velocities are retrieved from a linear array of receivers which crosses the main road perpendicularly. In this setup, the choice of the time window sizes could be further informed by our proposed framework, possibly using consistently shorter time windows for receiver pairs which are closer to the road. This interesting study points out that longer observation periods do not necessarily lead to better results, which indicates that correlations in the underlying ambient noise are too significant for the stacked-averaged retrieval to be appropriate, analogously to the case illustrated in
Figure 6. Thus, integrating the random windowing approach could be beneficial by providing an informed method for selecting temporal window sizes so that the effect of spurious arrivals is minimized, or even averaging over different time window sizes. Liu et al. [
35] conclude, using physically motivated ray-path arguments, that the crosstalk can be negligible when performing interferometry using seismic noise generated by high-speed trains in the case of direct, scattered and refraction waves. Our approach to interferometry from correlated noise sources links the quality of the retrieved phase of the inter-receiver Green’s function from direct waves explicitly to the correlation structure of the noise source, as well as the receiver array geometry (inter-receiver distance and distance to the source-line) while further providing a methodology to mitigate the spurious effects induced by any of these causes. In the common case of sources in motion, we established a link between the spurious energy in the retrieved trace and the speed of the source, and the proposed methodology has the potential to mitigate spurious effects induced by the Doppler spread for this type of source. We further note that the receivers in [
35] are very close to the location of the source train, confirming that correlated noise interferometry is applicable in the near-field of seismic noise generated by road traffic and trains.
We constrained our study to estimating the direct wave arrivals in two dimensions, which loosely resembles the case of surface waves in the three-dimensional setup. Similar derivations could be carried out for body waves that have been critically refracted in three dimensions along horizontal interfaces in the subsurface and then refract back up to the surface. Our methodology could be used to image and monitor the subsurface using these non-physical refracted arrivals, as suggested by King and Curtis [
50] and Brenguier et al. [
32], respectively.
Finally, our method is optimized when the location of the source in motion can be identified from a location tracker or from the acquired data. The latter approach has been shown to be possible for example in [
32,
34], but may not be feasible in an urban environment with a more complicated road network and multiple simultaneous sources. In those situations, one could apply directional decomposition of the wavefield into constituent energy from its different individual sources prior to using these methods, similarly to the methods employed by Maranò et al. [
51]. Although we have focused on the application of interferometry from noise sources in motion, representative of most known physical sources that contribute to ambient noise, the random windowing method can be used regardless of the underlying cause of statistical correlation in the ambient noise field.