*2.1. Array Geometry*

The passive noise data were acquired by the PRMRA installed by Equinor in the Norwegian North Sea. The PRMRA contains 17 cables with length from 1.5 to 12.85 km, covering an area of approximately 50 km2. The Array Geometry is shown in Figure 1. The PRMRA is oriented in a 26–206◦ direction. The cable spacing is 300 m and the receiver spacing in cable is 50 m.

## *2.2. Noise Analysis*

The noise data used here were continuously recorded by hydrophones in each receiver station for 1.02 h at a sampling rate of 500 Hz on 14 September 2015.

#### 2.2.1. The 'Natural Ambient Noise'

Figure 2a shows the noise recording on one receiver. Relatively speaking, the noise recordings marked with blue box in Figure 2a are not affected by severe events and can be denoted it as 'natural ambient noise'. Figure 2c shows the enlarged view of the 'natural ambient noise' and Figure 2d presents the corresponding power spectral density (PSD) of it in blue. The PSD of the 'natural ambient noise' (blue) shows a peak at 0.7 Hz, which can be recognized as the peak of the microseisms in the ocean [28]. As the frequency increases from 0.7 Hz to 4.5 Hz, the PSD decreases gradually and no other peak appears. Actually, the frequency band of 0.2 to 4.5 Hz of the ocean ambient noise is usually dominated

by the microseisms, which are believed to be generated by wind-related wave–wave interaction [28]. The microseisms are thought to be more evenly distributed in the ocean and the frequency band of 0.2 to 4.5 Hz is chosen to apply PSI in this paper.

**Figure 1.** The PRMRA geometry. Not all the receivers are plotted in the figure for display purposes. The gather marked in red is used in the following example of beamforming. The gathers marked in green and red are used in the following examples of the NCC retrieval. The black circle and marks on it are used in the following study of events.

#### 2.2.2. Event A

In Figure 2a, waves with large amplitude (denoted as Event A) dominate the noise in the range of 4 to 26 min and 46 to 57 min. Figure 2b shows the averaged spectrogram of all receivers. Most of the energy of Event A spreads within the frequency band larger than 5 Hz. Part of the energy also affects the frequency band of 2 to 5 Hz considerably (especially in the time band of 20 to 26 min). Figure 2d presents the corresponding PSD of part of Event A (marked by a green box in Figure 2a) in green. It shows a similar peak with 'natural ambient noise' (blue) at 0.7 Hz but also shows increased energy beyond 2.5 Hz. 2 other higher peaks appears at about 8 Hz and 15 Hz.

#### 2.2.3. Event B

In Figure 2a, Event B (marked by red box) happened during the period of 31 to 36 min. It is not obvious in the raw data. However, it becomes apparent in the spectrogram (solid red box in Figure 2b) and shows high energy in the frequency band of 1.5 to 5 Hz. Figure 2d presents PSD of Event B in red. It shows a similar trend to 'natural ambient noise' (blue) and Event A (green) below 0.7 Hz, but shows increased energy in the frequency band of 0.7 to 20 Hz. A higher peak appears at about 3 Hz.

**Figure 2.** Temporal and spectral analysis. Letters A, B and C denote different events detected. (**a**) the pressure recorded at one receiver. The pressure is normalized by the maximum amplitude. The blue box selects a window where the noise is not affected by severe events. The green box selects a window where the noise is affected by Event A. The red box selects a window where the noise is affected by Event B; (**b**) the averaged PSD (dB re 1 Pa2/Hz) of all receivers as a function of time and frequency (spectrogram); (**c**) the enlarged view of the noise marked by blue box in (**a**); (**d**) the PSD of the corresponding noise marked by boxes in (**a**). The curves are related to the noise by the color of the boxes in (**a**). The black curve is the averaged PSD over 60 min.

#### 2.2.4. Events C

Some other events can be recognized in the spectrogram (denoted as Events C and marked by dashed box in Figure 2b). These events have higher energy than the background noise and are usually generated by strong, directional sources.

#### 2.2.5. Discussion of the Events

Overall, Figure 2d indicates that the PSD are clearly contaminated by Events A and B which are generated by strong and directional sources. The black curve shows the PSD of 60-min recordings. It describes the average effect of all events. Compared with the blue line, it shows increased energy beyond 1.5 Hz and is especially influenced by Events A and B.

The existence of Events A, B, and C are common and unpredictable in the real world. It damages the diffuse and equipartition property of the noise and bring challenges to the PSI.

#### 2.2.6. Additional Insights on the Events

Figure 2 shows that Event A has a very regular time interval. The temporal and spectral signatures of Event A indicate that it comes from artificial sources which are very like airgun shots from other seismic surveys near this field. Unfortunately, there is no exact information available on these shots. In order to further study these signals, a travel time study is employed. Thirty-one sensors on the circle in Figure 1 are selected and numbered counterclockwise (with the red five-pointed star numbered as 1). Figure 3a shows the arrival time of the airgun signal on different sensors. The 4th sensor gets roughly the earliest arrival (green) and the 25th sensor gets the latest arrival (red). The 4th sensor and the 25th sensor are marked by a green asterisk and the red small circle in Figure 1, respectively. It indicates that the angle of incidence of the airgun shots is approximately 45◦ (the normal direction, which is perpendicular to the cable, is taken as a reference in the paper). The velocity of the signal is about 1450 m/s, which is estimated by travel time analysis. Figure 3b shows the spectrum of the signals in Figure 3a. It shows that the main energy of the airgun is concentrated in the frequency band of 3 to 200 Hz.

**Figure 3.** (**a**) arrival analysis on Event A. The green waveform represents the earliest arrival and the red waveform represents the latest arrival. The amplitudes of the signals are normalized by the maximum of each waveform; (**b**) the spectrum of Event A. Note that the magnitude is normalized to the maximum value.

Figure 2b shows that the energy of Event B concentrate in the frequency band of 1.5 to 5 Hz. It fits the spectrum of earthquake very well [29]. Thus, it is possibly a small earthquake which lasts few minutes.

#### **3. NCC Retrieval Using the Adapted Eigenvalue-Based Filter**

Events similar to A, B, and C are unavoidable and unpredictable in the real world, challenging the accuracy and stability of the PSI. The adapted eigenvalue-based filter can help to mitigate the influence of these events and obtain improved NCC [22]. We apply the adapted eigenvalue-based filter to the noise data recorded by the PRMRA. In the following subsections, we summarize the basic ideas of the adapted eigenvalue-based filter (a detailed description can be found in [22]) and retrieve the NCC from the contaminated noise data using the adapted eigenvalue-based filter.

#### *3.1. An Overview of the Adapted Eigenvalue-Based Filter*

The main operation of the adapted eigenvalue-based filter is carried out on the SCM **R**ˆ (*f*). Supposing that the noise records are segmented into *M* segments, the SCM **R** ˆ (*f*) can be computed by

$$\hat{\mathbf{R}}(f) = \frac{1}{M} \sum\_{m=1}^{M} \mathbf{u}\_m(f)\mathbf{u}\_m(f)^H. \tag{1}$$

where **<sup>u</sup>***m*(*f*) is the Fourier coefficients vector of the noise records at a particular frequency *f* and *H* denotes Hermitian transpose.

The adapted eigenvalue-based filter tries to separate the SCM into different components as

$$\begin{split} \hat{\mathbf{R}} &= \sum\_{k=1}^{K} \boldsymbol{\lambda}\_{k} \boldsymbol{\Psi}\_{k} \mathbf{\hat{v}}\_{k}^{H} + \sum\_{k=K+1}^{N'} \boldsymbol{\lambda}\_{k} \mathbf{\hat{v}}\_{k} \mathbf{\hat{v}}\_{k}^{H} + \sum\_{k=N'+1}^{N} \boldsymbol{\lambda}\_{k} \mathbf{\hat{v}}\_{k} \mathbf{\hat{v}}\_{k}^{H} \\ &= \hat{\mathbf{R}}\_{\boldsymbol{\xi}} + \hat{\mathbf{R}}\_{d} + \hat{\mathbf{R}}\_{i\prime} \end{split} \tag{2}$$

where *λ* ˆ *k* (ascending order) and **v**ˆ *k* are the eigenvalues and eigenvectors of **R**ˆ , respectively, **R**ˆ *s* is the strong, directional noise-related component, **R** ˆ *d* is the diffuse noise-related component, and **R**ˆ *i* is the uncorrelated noise-related component. The objective of the separation is to ge<sup>t</sup> a wavefield which is closer to be equipartitioned by suppressing **R** ˆ *s* and **R** ˆ *i* in the SCM.

The key point of the separation is the determination of values for *N* and *K*. *N* is called the cutoff number. Eigenvalues smaller than *λ* ˆ *N* are thought to be related to uncorrelated noise (such as electronic noise and sensor self-noise) and are filtered. *N* can be determined by a theoretically derivation based on the geometry and degrees of freedom of the wavefield [22,30]. Eigenvalues larger than *λ* ˆ *K* are thought to be related to strong, directional sources and need to be suppressed. A statistical hypothesis test is applied to determine the value of *K* [22]. By comparing the largest eigenvalues of the SCM and the statistical model of the ideal SCM (generated by diffuse, equipartitioned wavefield), all the eigenvalues larger than *λ* ˆ *K* will be rejected. The weight of the hypothesis test enhances the effectiveness of the adapted eigenvalue-based filter and makes it more adaptable for different datasets.

The basic strategies to determine *K* and *N* are provided in Appendix A. Once the values for *K* and *N* are determined, we can obtain the filtered SCM *<sup>R</sup>*<sup>ˆ</sup>*ij*(*f*) by suppressing **R**ˆ *s* and **R**ˆ *i*. The time-domain NCC function *Cij*(*t*) is defined as

$$\mathbf{C}\_{ij}(t) = \int\_0^T \mathbf{s}\_i(\tau)\mathbf{s}\_j(\tau+t)d\tau,\tag{3}$$

where *si*(*t*) and *sj*(*t*) denote the ambient-noise records obtained by receivers *i* and *j*, respectively, and T denotes the observation period. In the frequency domain, given the filtered SCM **R**ˆ (*f*), we can compute the NCC of the data as

$$\mathcal{C}\_{ij}(t) = \mathcal{F}^{-1}[\mathcal{R}'\_{ij}(f)],\tag{4}$$

where F −1 denotes the inverse Fourier transform, *f* denotes the frequency, and *<sup>R</sup>*<sup>ˆ</sup>*ij*(*f*) denotes the entry of **R**ˆ (*f*).

#### *3.2. The Result*
