**1. Introduction**

Multibeam echo sounders (MBESs) are important remote-sensing acoustical systems whose primary goal is mapping the seabed. They are also widely used to detect targets in water columns [1]. Many types of MBESs can collect water column image (WCI) data, which carry backscattering signals of scatters from the transducer to the seabed. The images can be used to detect artificial or natural structures in water columns, such as gas bubbles rising from seep sites [2–8] or gas pipelines [9], shipwrecks [10], fish schools [11], in addition to serving as a reference for the quality control of multibeam bathymetric data.

WCIs use the differences in acoustic characteristics, such as backscattering strength or target strength, to detect solid, liquid, or gas targets by distinguishing them from the background images. For the gas emissions discussed in this paper, their appearance in images is flare-like [12] and tends to rise from the source. In addition, the ascending gases and other scatterers may be deflected by

water/sea currents [4]. Generally, these may be the result of leaks from man-made structures, such as underwater gas pipelines, or gases that were released from the seabed. Over the past two decades, research on gas emissions using image data has mainly focused on the aspects of detection (presence or absence) and positioning, as well as their quantification [2,12–18]. A typical method in the detection of gas emissions is to mark the suspected gas target by manually screening the image data collected by the MBES [1,5,12,16]. Therefore, it is necessary to study methods automatically detecting gas emissions and improving data processing efficiency [19].

In general, from the perspective of different input data sources, there are two ways to automatically detect leaking gases using MBES. One way is to detect targets in water columns by processing beam space data of MBES, such as the multi-detection technique proposed by Christoffersen et al. [20]. Compared to traditional multibeam seabed detection technologies (e.g., the weighted mean-time and the zero-crossing of the phase difference), multi-detection algorithms have been applied in some MBES applications and can be applied to detect seabed topography and targets suspended in water columns at the same time.

Another method is to detect targets using the WCIs as data sources. For instance, some automatic detection methods of bubble leakage via the images have been presented by Urban et al. and Zhao et al. [12,15], and were used to localize the bubble vent. However, it is often difficult to avoid the strong interference from sidelobes when the pixel intensity data in the images are used for target detection [12,21]. One solution to address this problem is to exclude WCI data outside a minimum slant range (MSR) [12], while the excluded area may contain part of the target image, if one is present. Furthermore, if a gas-detection algorithm relies only on image intensity information, without other auxiliary methods in data interpretation, it cannot confirm whether the detected targets are rising gases or if it is from other targets, such as underwater artificial structures with the same outline as gas flares. Therefore, coherent flow structures have been used as a feature of detection [22], and the motion of rising bubbles was considered [18] to detect leakage via cross-correlation of the WCIs in the depth-across orientation. This method not only discriminates the presence or absence of spilled gases, but also distinguishes them from fish, which is one of the major reasons for misdetections. However, in this method, the potential situation of false but large motion speed due to continuous frame changes of the strong backscatter pixels from the seafloor has not been considered.

The optical flow method is another motion-estimation technique. It has been validated using suspended objects (e.g., leaking gases [23] or sulfur dioxide flux) from infrared images or CCD images and has already been suggested as a promising method for detection of gas leaks moving patterns by von Deimling et al. [18]. In this paper, two types of information, amplitude and velocity, are comprehensively considered, and an automatic detection method for underwater gas leaks is designed to distinguish other strong scatterers in the water such as the seafloor. In addition, the above methods are mainly based on "depth-across track" (D-T) image, which needs the process of converting from the beam space data to the D-T image. This paper reduces the above process, and further proposes to design a more efficient parallel structure based on the beam data. In other words, we use the Farneback optical flow method to estimate the motion of underwater gas emissions. In our process, the potential interference of sidelobes on the leaking gas images is suppressed before WCI generation. Subsequently, we designed two processing schemes corresponding to the two types of images, and the intensities and velocity information of the images are combined for discrimination. Finally, we tested the effectiveness of the above two processing schemes by simulating pipeline leakage scenarios in water tanks and Songhua Lake in China.

#### **2. Materials and Methods**

#### *2.1. Water Column Image (WCI) Generation and Sidelobe Suppression*

After receiving underwater multi-channel echo signals, the MBES usually needs to perform array beamforming processing to obtain the time series of echo amplitudes at different beam angles. Then, sonar equation [24] can be used to compute the intensity of the echo signals at each beam, the backscattering strength from the seabed, or the target strength from other targets in water. In general, these echo data at all beam directions can be visually displayed in two typical WCIs [17,25]. One is that the amplitude/intensity time series are directly arranged into a two-dimensional data matrix, which is displayed as a two-dimensional image with coordinate axes corresponding to time and beam angle, referred to as a "time-angle" (T-A) image. Figure 1a is a T-A image, in which the amplitude time sequences are obtained using the discrete Fourier transformation (DFT) of the multi-channel echo signals. Moreover, the T-A image is often used as a data source for seafloor detection methods. The vertical and horizontal coordinates of this image correspond to the beam angle and the two-way travel time (TWTT) of the echo signals. In the figure, multi-channel echo data was acquired on the lake by using the HT300 PA MBES mentioned later. The emission signal of sonar is a CW pulse signal with a pulse width of 0.1 ms. The other WCI is the amplitude/intensity time series in polar coordinates, converted into two-dimensional images in Cartesian coordinates, with its horizontal and vertical coordinates corresponding to the horizontal distance and vertical depth, respectively. This is referred to as a "depth-across track" (D-T) image (as shown in Figure 1b). The complexity of the coordinate transformation is embodied in the need to consider the influence of irregular beam space [25], and thus it may be necessary to interpolate pixels without data in the beam sector of the D-T image. Moreover, it is necessary to correct the spatial position of the amplitude/intensity data in Cartesian coordinates by ray tracking when the sound velocity in water varies significantly with the depth. The coordinate system corresponding to Figure 1b is more suitable for observation with the human eye; thus, it is often displayed in the display and control software of a sonar system. In addition, from the areas within the red boxes in Figure 1a, it can be seen that echo energy from the beam, perpendicular to the seabed, leaks into the main lobe direction of all other beams, such that obvious stripe patterns appear at and outside the MSR position. Correspondingly, the straight stripe patterns appear curved in Figure 1b.

**Figure 1.** Examples of T-A and D-T images, and the results of sidelobe effect elimination based on the minimum variance distortionless response (MVDR) beamforming methods [26]. (**a**,**c**) are the T-A images, whose ordinates and horizontal coordinates correspond to the beam angle and two-way travel time (TWTT). (**b**,**d**) are the D-T images, whose ordinates and horizontal coordinate correspond to vertical depth and horizon range. (**a**,**b**) are obtained using a discrete Fourier transformation (DFT) beamforming method, and (**c**,**d**) are the processing results via a MVDR beamforming method.

Currently, in the above WCI generation process, most MBESs must use DFT or fast Fourier transformation (FFT) techniques to achieve beamforming tasks. These techniques offer simplicity and good real-time performance; however, they have energy leakage, which can cause sidelobe interference. Thus, the echo energy from the beam in the direction perpendicular to the seabed or the strong scattering in the water column leaks into the main lobe direction of all the other beams. For bathymetric

measurements, a typical effect would be to mistake the flat seabed topography for a curved seabed topography with both sides upturned. The sidelobe effect here is also called a "tunnel effect" [27]. The sidelobe effect is a strong interference source in seabed topography survey and underwater target detection. Tangible and physical targets in water columns, gaseous or not, are real structures and can produce backscatters. However, the sidelobe effect can also produce a false target image and its intensity level cannot be ignored, which can hinder the MBES to detect real targets. Therefore, it is important to acquire WCI with suppressed sidelobe effects before the target detection methods are performed [12,28].

When the sidelobes are eliminated based on MBES, object data can be mainly processed in three types: channel signals, beam output sequences, and WCIs in different stages of data processing of the MBES. Firstly, the channel signal is processed using beamforming techniques, such as the minimum variance distortionless response (MVDR) algorithm [26], to suppress the sidelobe leakage. Secondly, adaptive filters can be used to dispose beam output sequences to offset the sidelobe effect [29]. Finally, the suppression of sidelobe effects can be achieved by excluding data outside the MSR. In this work, the MVDR beamforming method is used to process the channel signal data collected by the MBES to reduce sidelobes in the WCIs. The array output of this beamforming method can be expressed as [30]:

$$F(n) = \mathbf{w}^H \mathbf{x}(t\_\varepsilon) \tag{1}$$

where **x**(*te*) is the array data vector; *te* is discrete time of echo signal; and **w** is the array weight coefficient, which is defined as:

$$\mathbf{w} = \frac{\mathbf{R}\_{\text{xx}}^{-1} \mathbf{a}(\theta)}{\mathbf{a}(\theta)^{H} \mathbf{R}\_{\text{xx}}^{-1} \mathbf{a}(\theta)} \tag{2}$$

where **a**(θ)is the array manifold vector, H denotes matrix Hermitian transpose, and **R***xx*= E **x**(*te*)**x***H*(*te*) is the correlation matrix of the array data. Furthermore, for a spherically isotropic noise field, the directional gain of the array can be expressed by a directivity index (DI) written as [30]:

$$DI = 10 \log \frac{\left| \mathbf{w}^H \mathbf{a}(\theta) \right|^2}{\mathbf{w}^H \mathbf{R}\_{\text{vv}} \mathbf{w}} \tag{3}$$

where **R***vv* represents an isotropic noise correlation matrix whose *mn*th matrix element is defined as:

$$\langle \mathbf{R}\_{\upsilon\upsilon} \rangle\_{mn} = \frac{\sin[(m-n)kd]}{(m-n)kd}, \, m, n = 1, 2, \ldots, M \tag{4}$$

where *M* is the number of array elements, *d* is the element spacing, and *k* is the wave number. Figure 1c shows a T-A image obtained by using the MVDR beamforming method with the same data as Figure 1a, and Figure 1d is the D-T image corresponding to Figure 1c. By comparing the four figures in Figure 1, it can be seen that the MVDR beamformer can successfully suppress the sidelobe leakage.

#### *2.2. Motion Estimation of Gas Emissions Via Farneback Optical Flow*

#### 2.2.1. Motion Estimation Using D-T Images

Leaking gases exist in the form of bubble groups in the water column. They diffuse and rise to the surface. These gases result in changes in the intensity distribution in multi-frame WCIs, which can be exploited using the optical flow method to estimate the motion information of the gas emissions. The optical flow calculation [31] uses the variation and correlation of pixel values in two sequential image frames to determine the "motion" of each pixel location. As a basic condition for the application of the optical flow method, it is assumed that the pixel values in the two images satisfy d*F x*, *y*, *tf* /d*tf* = 0, that is:

$$F(\mathbf{x}, y, t\_f) = F(\mathbf{x} + \Delta \mathbf{x}, y + \Delta y, t\_f + \Delta t\_f) \tag{5}$$

The generation and extinction of the bubble group, their distribution, and changes in size as they rise will cause rapid changes in the image, which does not occur in sonar images of fixed-shape objects. Therefore, to satisfy the above assumptions, the time interval between frames cannot be too long. Thus, in this paper, when the experimental data is collected in pool and lake, the frame rates of images are 20 Hz and 16 Hz, respectively. Motion information of leaking gasses has been estimated using the Farneback optical flow method, which is a classical dense optical flow estimation algorithm for target motion using images, and its basic principle and implementation are described in detail in [31]. The main idea of this method is to approximate the neighborhood near each pixel of an image as a quadratic polynomial, and to estimate the displacement field of the two frames using a polynomial expansion. Specifically, the neighborhood of each pixel of two images (*F*1, *F*2) at different times can be approximated as:

$$\begin{cases} F\_1(\mathbf{z}) = \mathbf{z}^T \mathbf{A}\_1 \mathbf{z} + \mathbf{b}\_1^T \mathbf{z} + c\_1 \\\ F\_2(\mathbf{z}) = \mathbf{z}^T \mathbf{A}\_2 \mathbf{z} + \mathbf{b}\_2^T \mathbf{z} + c\_2 \end{cases} \tag{6}$$

where **z** is coordinate vector of image pixel, parameter **A** is a symmetric matrix, **b** is a vector and *c* is a scalar. It is assumed that *F*<sup>2</sup> is equal to *F*<sup>1</sup> with a global displacement **d**, that is,

$$F\_2(\mathbf{z}) = F\_1(\mathbf{z} - \mathbf{d})\tag{7}$$

By using Equation (6) to expand the two functions at both ends of Equation (7) and making the coefficients of the items equal, the global displacement can be obtained. From a visual point of view, an image whose horizontal and vertical axes are represented by the distance is more conducive to directly show the shape of the target. This is also the reason why D-T images are commonly shown in the display and control software of the sonar system rather than T-A images. When the displacement is estimated from two D-T images and divided by the time interval between the two frames, the velocity of each pixel can be obtained. Scheme A of Figure 2 describes a processing flow from beamforming processes of multi-channel echo data to estimate velocity fields and detect targets. The multi-channel echo data are converted into beam space data or a T-A image through the beamforming processing, and then the D-T image can be produced using a coordinate transformation and interpolation.

**Figure 2.** Two types of data processing schemes in motion estimation of gas emissions.

Some of the methods (e.g., [1,12,17]) to detect gas leaks based on MBESs only used the amplitude/ intensity information of D-T images in Scheme A. However, if the detection is only based on amplitude/intensity, it cannot automatically distinguish between strong scatterers such as gas leaks and the seafloor. The motion of rising bubbles was considered [18] to detect leakage via cross-correlation; however, the potential situation of false but large motion speed due to continuous frame changes of the strong backscatter pixels from the seafloor has not been considered. In this paper, two types of information, intensities and velocity vectors, are comprehensively considered, and an automatic

detection method for underwater gas leaks is designed to distinguish other strong scatterers in the water such as the seafloor (see Section 2.2.3). Specifically, the D-T images are processed using Farneback optical flow to obtain the velocity vector field.

Furthermore, in Scheme A, all feature information originates from the D-T images; thus, a process of converting from the beam space data to the D-T image is required. For a sonar system with automatic online detection requirements, the serial structure brings with it large hardware costs and affects real-time computing efficiency. For this reason, this paper further proposes the design of a more efficient parallel structure based on beam data. This new structure will affect velocity vector estimation; thus, it is necessary to further derive a corresponding conversion calculation method.

#### 2.2.2. Motion Estimation Using T-A Images

Scheme A in Figure 2 is mainly a serial structure in which stringent requirements are often imposed on the real-time processing capability of the sonar hardware platform. This paper thus proposes Scheme B to directly estimate the motion information of the leaking gases using the T-A image, thereby reducing the processing steps from multi-channel data acquisition to target detection based on velocity. Moreover, the velocity field estimation can be processed in parallel with the D-T image and bathymetric estimation algorithms, instead of relying on serial processes as is done in Scheme A. In Scheme B, the Farneback optical flow method is used to process two frames of the T-A image to obtain relative displacement (*t*0, θ0) in the direction of echo time and beam angle axis. When the displacement is divided by the time interval, the velocity vector (*ut*, *v*θ) in the coordinate system with time/orientation as the coordinate axis can be obtained. During the hydrographic surveying, with the fluctuation of the vessel, the sonar array will produce changes in orientation, such as roll and pitch, which will bias the estimated target motion displacement. Fortunately, modern multibeam sonar products usually have roll- and pitch-compensation capabilities, and even medium and deep water MBES can compensate for the heading information, thus avoiding the above-mentioned errors.

The velocity vector (*ut*, *v*θ) estimated from the T-A images can be used to complete the task of leaking gas detection. There is often a requirement to display velocity information in conjunction with the D-T image. In such cases, it is necessary to convert (*ut*, *v*θ) into a motion velocity vector *ux*, *vy* with horizontal and vertical displacement per unit time, that is, velocity, as coordinates. Under the T-A image with time-beam angle as the coordinate system, the coordinate transformation of (*t*0, θ0) can be expressed linearly as:

$$
\begin{pmatrix} t' \\ \partial' \end{pmatrix} = \begin{pmatrix} t \\ \partial \end{pmatrix} + \begin{pmatrix} t\_0 \\ \partial\_0 \end{pmatrix} \tag{8}
$$

Then, the displacement (*x*0, *y*0) of the point in the Cartesian coordinate system can be expressed as:

$$
\begin{pmatrix} \mathbf{x}\_0 \\ \mathbf{y}\_0 \end{pmatrix} = \begin{pmatrix} \frac{c\left(t + t\_0\right)\sin\left(\theta + \theta\_0\right) - ct\sin\left(\theta\right)}{2} \\\ \frac{c\left(t + t\_0\right)\cos\left(\theta + \theta\_0\right) - ct\cos\left(\theta\right)}{2} \end{pmatrix} \tag{9}
$$

where *c* is the speed of sound at the specified depth of the pixel. Then, *ux*, *vy* can be obtained by the division of (*x*0, *y*0) and time interval.

### 2.2.3. Detection of Gas Emissions

When using the MBES to detect gas emissions, the possible interference sources include sidelobe effects, backscatter contributions from non-gas targets in water (e.g., volume reverberation, seabed reverberation), and background ambient noise, among others [15]. An amplitude threshold can be used to suppress the interference of volume reverberation with low energy contributions. This threshold can be set to a dynamic value varying with the frames, or to a fixed value. The following threshold can be used:

$$|P(i,j) - P\_{ref}| \gg k\sigma \tag{10}$$

where *P*(*i*, *j*) is the value of each pixel in the WCI, *k* is the adjustable weight factor, and the values *Pre f* and σ are the median and standard deviation of all pixel values of the image, respectively. If the pixel in the frame satisfies Equation (10), then it is judged to belong to a part of a leaking gas image; otherwise, the pixel value is eliminated.

For a strong backscattering and relatively static target, the same decision criterion as in Equation (10) can be used, but the data involved in the calculation is replaced by the magnitude of the velocity estimated, that is, *V* = *u*2 *<sup>x</sup>* + *v*<sup>2</sup> *<sup>y</sup>*. Furthermore, with regard to the relatively flat seabed topography, when a survey vessel is sailing, the image of the area corresponding to the seabed in WCIs of the adjacent pings will show the partial changes of the energy distribution in the horizontal direction, which is mainly due to the statistical fluctuation of the seabed echo energy caused by the fluctuation of the micro-topography. This phenomenon may also bring about large velocities corresponding to the seabed portion of the velocity field, and most of the velocity direction should be similar to the seabed terrain. In consideration of the above situation, a directional threshold is further set to exclude the target image in the horizontal direction or close to it, in the estimated velocity field. Specifically, in this paper, the threshold on the velocity direction relative to the horizontal is set at 40 degrees. The rising gases may be deflected by currents [4], and thus the range of the velocity direction threshold should be adjusted according to the direction of the ocean current in the sea test. In addition, schools of fish are also common underwater moving targets. Through this direction threshold, some fish school images shown in the horizontal direction can be eliminated.
