Thresholding Dolphin Whistles Based on Signal Correlation and Impulsive Noise Features Under Stationary Wavelet Transform

Zhou, Xiang; Wu, Ru; Chen, Wen; Dai, Meiling; Zhu, Peibin; Xu, Xiaomei

doi:10.3390/jmse13020312

Open AccessArticle

Thresholding Dolphin Whistles Based on Signal Correlation and Impulsive Noise Features Under Stationary Wavelet Transform

by

Xiang Zhou

¹

,

Ru Wu

¹

,

Wen Chen

¹

,

Meiling Dai

^2,*,

Peibin Zhu

^1,*

and

Xiaomei Xu

³

¹

School of Ocean Information Engineering, Jimei University, Xiamen 361021, China

²

School of Marxism, Jimei University, Xiamen 361021, China

³

College of Ocean and Earth Sciences, Xiamen University, Xiamen 361005, China

^*

Authors to whom correspondence should be addressed.

J. Mar. Sci. Eng. 2025, 13(2), 312; https://doi.org/10.3390/jmse13020312

Submission received: 15 January 2025 / Revised: 3 February 2025 / Accepted: 6 February 2025 / Published: 7 February 2025

(This article belongs to the Section Ocean Engineering)

Download

Browse Figures

Versions Notes

Abstract

:

The time–frequency characteristics of dolphin whistle signals under diverse ecological conditions and during environmental changes are key research topics that focus on the adaptive and response mechanisms of dolphins to the marine environment. To enhance the quality and utilization of passive acoustic monitoring (PAM) recorded dolphin whistles, the challenges faced by current wavelet thresholding methods in achieving precise threshold denoising under low signal-to-noise ratio (SNR) are confronted. This paper presents a thresholding denoising method based on stationary wavelet transform (SWT), utilizing suppression impulsive and autocorrelation function (SI-ACF) to select precise thresholds. This method introduces a denoising metric

ρ

, based on the correlation of whistle signals, which facilitates precise threshold estimation under low SNR without requiring prior information. Additionally, it exploits the high amplitude and broadband characteristics of impulsive noise, and utilizes the multi-resolution information of the wavelet domain to remove impulsive noise through a multi-level sliding window approach. The SI-ACF method was validated using both simulated and real whistle datasets. Simulated signals were employed to evaluate the method’s denoising performance under three types of typical underwater noise. Real whistles were used to confirm its applicability in real scenarios. The test results show the SI-ACF method effectively eliminates noise, improves whistle signal spectrogram visualization, and enhances the accuracy of automated whistle detection, highlighting its potential for whistle signal preprocessing under low SNR.

Keywords:

dolphin whistle; stationary wavelet transform; bioacoustics; impulsive noise; wavelet thresholding; autocorrelation function

1. Introduction

Marine mammals are vital for maintaining the health of marine ecosystems. As top predators, species such as dolphins significantly influence the balance of the entire marine ecosystem. Dolphins are recognized as indicators of marine ecosystem health status [1], and their pronounced responses to environmental variations or changes make them valuable as ecosystem sentinels [2].

Dolphins communicate through acoustic signals, which are classified into three types: whistles, clicks, and burst pulses. All these sounds serve social purposes [3]. Whistles, narrowband frequency-modulated signals, are the basis of vocal exchanges between individuals and convey information about identity, behavioral states, and environmental conditions [4]. Dolphins primarily use clicks for environmental perception and hunting, while burst pulses are mainly associated with social interactions and occur more frequently during such states than other behavioral states [5].

The acoustic parameters of dolphin whistle signals (e.g., start frequency, end frequency, minimum and maximum frequency, bandwidth, duration, and the number of inflection points) vary due to factors such as activity status, group size, composition, geographic location, and ambient noise levels [6]. Bottlenose dolphins convey individual identity-related information through unique signature whistles. Studies show significant differences in the fundamental frequency contours of signature whistles among individual dolphins, with changes observed in the same dolphin across developmental stages [7]. Notable acoustic differences exist among dolphin populations and whistle types [8], with geographic isolation potentially contributing to intra-species vocalization variations [9]. Vessel activity and engine noise affect dolphin whistle parameters, especially when vessels are nearby [10]. In underwater communication, whistle signals are crucial for bionic covert underwater acoustic communication technologies, enabling secure information transmission [11,12]. In summary, dolphin whistle signals have significant research value in ecology, biology, and underwater communication.

Passive acoustic monitoring (PAM) is a reliable and effective method for the long-term monitoring of marine mammals [13]. PAM supports monitoring marine life dynamics, promotes conservation efforts, and guides sustainable management practices. However, manually monitoring and annotating dolphin whistle signals from large PAM datasets is both time-consuming and labor-intensive. Additionally, ambient noise, especially anthropogenic noise, is inevitably introduced during signal acquisition. For instance, during a container ship’s passage, the broadband received sound-pressure levels recorded by hydrophone can reach 155 dB re 1 μPa [14], further complicating the extraction of whistle signal features.

Efficient and accurate automatic detection, extraction, and classification of whistle signals remain major challenges in the complex and dynamic underwater environment. Researchers have proposed various innovative approaches in recent years. Serra et al. used ridge detection on time–frequency spectrograms to extract whistle contours and applied a random forest classifier for whistle classification [15]. Kipnis and Diamant used an image clustering method to optimize the connection process between whistle-contour tracking and counting [16]. Li et al. proposed a staged generative adversarial network (GAN) framework to generate training samples for automatically extracting whistle signals from time–frequency spectrograms [17]. While spectrogram-based approaches show promising results in processing dolphin whistles, they struggle to restore the temporal sequence characteristics of whistle signals. Mallawaarachchi et al. proposed a method that suppresses transient noise on the time–frequency spectrogram and utilizes pixel-based differentiation to eliminate non-impulse environmental noise, facilitating the extraction of the fundamental frequency of whistle signals while retaining phase information to reconstruct denoised signals [18]. However, when non-pulse ambient noise and whistle signals have similar energy levels, subsequent processing struggles to accurately distinguish between noise and signals, hindering effective extraction. Gruden and White applied a multi-target frequency-tracking sequential Monte Carlo probability hypothesis density (SMC-PHD) filter for automatic dolphin whistle-contour extraction [19]. They also used dual normalization in the frequency and time domains to reduce noise and enhance whistle signal features before extraction [20]. However, insufficient noise reduction under low SNR limits its applicability in complex environments. In summary, noise masks the features of whistle signals in the time–frequency domain. Optimizing denoising techniques is crucial for improving the performance of existing methods in whistle signal detection, extraction, and classification.

Preprocessing denoising improves the visualization of dolphin whistle signals in time–frequency spectrograms, aiding the analysis of their acoustic properties. For non-stationary signals, commonly used preprocessing methods include wavelet denoising [21], empirical mode decomposition (EMD) [22], variational mode decomposition (VMD) [23], principal component analysis (PCA) [24], and adaptive filter-based dynamic denoising techniques [25,26]. Although these methods can effectively remove most noise based on noise levels or signal statistical characteristics, dolphin whistle signals transmitted through underwater acoustic channels may exhibit weak energy in certain parts. Denoising must ensure the preservation of all signal features to enable subsequent processing and analysis. For example, explosive events may cause significant changes in dolphin whistle signals [27], while dolphins may employ amplitude compensation to cope with increased environmental noise to avoid signal masking [28]. Thus, the denoising process must adapt to both time-varying noise and variations in whistle signals. Wavelet denoising enables flexible threshold setting based on signal and noise characteristics at different scales. In contrast, other time–frequency denoising methods lack controllable parameters to adjust the degree of noise removal. Wavelet thresholding is the optimal method for achieving controllable noise reduction while preserving the complete structure of the dolphin whistle signals. Nevertheless, traditional thresholding methods and recent hybrid approaches integrating wavelet techniques typically rely on the statistical distribution of noise or signal characteristics to estimate the threshold [29,30,31], such as the median absolute deviation within wavelet coefficient levels and the noise intensity differences between levels. However, these methods often result in over-denoising or insufficient denoising under time-varying underwater noise conditions.

This study utilizes SWT decomposition, which enables an accurate approximation of the original signal components while requiring a few parameter selections. Meanwhile, the noise interfering with dolphin whistle signal analysis is categorized into impulsive and non-impulsive types for stepwise removal in the wavelet domain. The main contributions are summarized as follows:

(1): A denoising metric, $ρ$ , was introduced to estimate dolphin whistles under low SNR using signal correlation. A fast algorithm was developed to determine its optimal threshold, enabling the separation of fundamental frequency whistles from wavelet coefficients at different levels. The method effectively suppresses non-impulsive noise in the same frequency band while preserving whistle signal integrity and enhancing its distinct features;
(2): A wavelet-level sliding window approach was proposed to address impulsive noise in wavelet coefficients. This method processes impulsive noise coefficients by analyzing amplitude differences within the window and incorporating neighboring amplitude levels. Time–frequency spectrogram observations confirm its effectiveness in removing impulsive noise while maintaining whistle signal continuity;
(3): SWT decomposition levels were determined based on the dolphin whistle frequency range and the signal sampling rate to prevent excessive wavelet decomposition. The two proposed denoising methods (SI-ACF), relying on SWT decomposition, were validated using three types of typical underwater noise and real whistle data. The results demonstrate the SI-ACF method’s effectiveness in reducing noise interference in complex, time-varying underwater environments.

The rest of this paper is organized as follows. Section 2 outlines the fundamental principles of SWT decomposition and explains the basis for the proposed thresholding approach. Section 3 details the SI-ACF method denoising framework and processing steps. Section 4 evaluates the proposed SI-ACF method under three typical underwater noise scenarios (vessel noise, wind and wave noise, and background noise). It compares the denoising performance with other wavelet thresholding techniques and analyzes its application to a real whistle signal dataset. Finally, Section 5 presents the key conclusions and provides insights for future research directions.

2. Background and Theory

Time–frequency domain techniques are considered among the most effective solutions for extracting information from underwater acoustic signals [32]. Time–frequency spectrograms visually represent variations in dolphin whistle signals across time and frequency dimensions and are commonly used to extract their acoustic parameters. However, extracting whistle signals is primarily challenged by interference from ocean ambient noise. Wavelet transform, a multiscale analysis tool, uses windows of varying sizes for high- and low-frequency regions, allowing it to capture both transient changes and overall trends in signals. By analyzing wavelet coefficients at different frequency scales, this approach effectively separates noise from the target signal.

2.1. Stationary Wavelet Transform

To overcome the lack of shift invariance in the sub-band of the discrete wavelet transform (DWT), the SWT was introduced [33]. By removing the downsampling operator, SWT achieves shift invariance, as a result, exhibits redundancy, positioning it as an intermediate representation between the highly redundant continuous wavelet transform (CWT) and the non-redundant DWT. Similar to the CWT, SWT offers precise time–frequency resolution while overcoming the limitation of DWT, where minor time-domain shifts in the signal can cause significant variations in the energy distribution of coefficients across different levels [34]. Due to its redundancy and shift-invariant properties, the SWT decomposition generates coefficients that more closely approximate the original signal components, and its reconstructed signal components are also more accurate. This demonstrates greater robustness and reliability when processing non-stationary and nonlinear signals.

The wavelet coefficients at each level of SWT decomposition have the same length as the original signal. For a J-level SWT decomposition, the length of the signal is required to be an integer multiple of

2^{J}

. The SWT decomposition process can be expressed as follows:

A_{j + 1} (n) = \sum_{k \in Z} h_{j} (n - k) \cdot A_{j} (k)

(1)

D_{j + 1} (n) = \sum_{k \in Z} g_{j} (n - k) \cdot A_{j} (k)

(2)

h_{j + 1} = u p s a m p l e (h_{j}, 2), g_{j + 1} = u p s a m p l e (g_{j}, 2)

(3)

The wavelet coefficients

A_{j}

and

D_{j}

represent the approximation and detail coefficients at the j-level of wavelet decomposition, respectively. The impulse responses of the low-pass and high-pass filters,

h_{j}

and

g_{j}

, are used for upsampling through this pair of orthogonal mirror filters to achieve the partitioning of the wavelet sub-bands.

Figure 1 illustrates the process of two-level decomposition using SWT applied to a signal with a sampling rate of

f_{s}

. The signal is decomposed into approximation coefficients (

A_{1}

) and detail coefficients (

D_{1}

) with the wavelet scaling function and orthogonal mirror filters. Subsequently, the approximation coefficients

A_{j}

at the j-th level can be further decomposed into sub-bands using the filters

h_{j}

and

g_{j}

. The decomposed signal is divided into multiscale components for analysis, enabling wavelet coefficients at each level to be processed individually based on the signal and noise characteristics. The processed wavelet coefficients are reconstructed into a signal using the time-reversed form of the decomposition filters. In the frequency domain, wavelet decomposition is a recursive subdivision of the signal spectrum within the Nyquist frequency range, gradually dividing it into narrower frequency sub-bands.

2.2. Ocean Ambient Noise

Ocean ambient noise is generally classified into three types: biological noise (from marine mammals, fish, and invertebrates), natural noise (caused by wind, waves, rain, and underwater seismic activity), and anthropogenic noise (from human activities like vessel traffic and oil exploration) [35]. These sources collectively create a time-varying and complex ocean ambient-noise environment. According to the statistical characteristics of underwater noise, the underwater acoustic channel contains both Gaussian and non-Gaussian noise, such as impulsive noise (Figure 2) [36]. In marine-mammal acoustic signal processing, noise is typically categorized as impulsive or non-impulsive [37]. Additionally, whistle signals collected using PAM are often superimposed with various types of ocean ambient noise [38]. Therefore, this study employs an additive noise model for analysis, and removes impulsive and non-impulsive noise from ocean ambient noise in a stepwise manner.

Impulsive noise is characterized by an amplitude that is discontinuous from the surrounding signal, exhibiting high energy, short duration, and broadband characteristics. The detail coefficients from SWT decomposition capture the local time-domain features of impulsive noise, while the equal-length, multi-level wavelet coefficients ensure precise time–frequency localization. This enables the effective removal of impulsive noise while preserving the continuity of whistle signals. Due to the significant differences in amplitude, Section 3.2 will provide a detailed explanation of the principles and process for removing impulsive noise. This section focuses on analyzing the distinctions between whistle signals and non-impulsive noise.

Non-impulsive noise is defined as noise that can be removed without affecting the weaker energy components of the whistle signal. Removing this noise preserves the whistle signal’s integrity and prevents breakpoints that may cause discontinuities. Non-impulsive noise is mitigated using wavelet threshold denoising, with thresholds determined by various rules for applicability. Threshold selection is typically guided by the characteristics of the signal or noise. Entropy, a metric for quantifying uncertainty in information, can distinguish the deterministic features of whistle signals from random noise [39]. In complex noisy environments, whistle signals enhance the similarity between time samples [40]. Figure 3 presents the normalized autocorrelation results for dolphin whistle signals corrupted by noise and noise-only signals, based on the above findings. Whistle signals demonstrate higher regularity, while underwater noise in adjacent time segments exhibits significant randomness. To further investigate this relationship, we conducted correlation analyses between simulated whistle signals and various types of recorded ocean ambient noise. This analysis provides a theoretical foundation for noise suppression based on the correlation of whistle signals.

Let the clean whistle signal be represented as

x (n)

, the ocean ambient noise as

v (n)

, and the noisy signal as

y (n)

, as shown in Equation (4).

y (n) = x (n) + v (n)

(4)

The autocorrelation function of the combined signal is expressed in Equation (5). Here, N represents the signal length, n denotes the signal sequence index, and k indicates the time delay,

R_{x v} (k)

and

R_{v x} (k)

measure the similarity between signal and noise at different delays.

\begin{matrix} R_{y y} (k) & = \sum_{n = 0}^{N - k - 1} y (n) y (n + k) \\ = R_{x x} (k) + R_{v v} (k) + R_{x v} (k) + R_{v x} (k) \end{matrix}

(5)

Figure 4 shows the distribution of Spearman’s rank correlation coefficients between the simulated whistle signal and six types of ocean ambient noises, including biological (snapping shrimp and whale echolocation), anthropogenic (vessel and offshore wind turbine), and natural (wind and rain) noises. The results are visualized in a scatter plot, where

μ_{|r|}

represents the mean of the absolute values of the correlation coefficients, and

σ_{|r|}^{2}

indicates the variance of the absolute values of the correlation coefficients.

Figure 4 shows that the correlation coefficients are mainly distributed between −0.02 and 0.02. Additionally, the

μ_{|r|}

values, representing the correlation between simulated whistle signals and the six noise types, are all below 0.005. This indicates that the frequency and amplitude variations of frequency-modulated whistle signals show weak associations with these typical noise types. Therefore, the correlation characteristics of whistle signals can be used to differentiate them from noise. Based on the above correlation analysis, the following sections will detail how to leverage the correlation characteristics of whistle signals for noise suppression and efficient signal extraction.

3. Denoising Based on Wavelet Level-Dependent Thresholding

In complex underwater environments, changes in the relative position between the target sound source and the hydrophone result in Doppler frequency shifts. Additionally, multipath delays and the time-varying nature of the underwater acoustic channel create temporal variations in ocean ambient noise. These factors collectively affect the frequency and time distribution of the acquired signals, potentially dispersing the energy of the desired signal and increasing the complexity of signal processing. Wavelet decomposition uses multiscale analysis to effectively reveal the signal’s content and structure, as shown in Figure 5. After extracting the signal content (detail coefficients at level 3) from the original signal, a significant magnitude difference is observed between the effective signal and noise in the wavelet sub-band

D_{3}

. This difference forms a solid foundation for wavelet threshold denoising.

In addition to thresholding, wavelet processing requires selecting appropriate decomposition levels and wavelet bases. Most dolphin whistles have a fundamental frequency range of 2 to 30 kHz [41]. Given the limited frequency range of dolphin whistles, selecting too many decomposition levels for wavelet decomposition is unnecessary. Increasing decomposition levels only refines the frequency band of the approximation coefficients in the lower range, without affecting the whistle signal’s high-frequency components. Therefore, the excessive subdivision and thresholding of wavelet coefficients that lack whistle components is unnecessary and only increases computational complexity. Considering the fundamental frequency range of dolphin whistles and the coarse frequency subdivisions of wavelet sub-bands in [42], we propose Equation (6) to determine the optimal SWT decomposition level for dolphin whistle signals.

f_{M} \geq \frac{f_{S}}{2^{l + 1}}

(6)

The wavelet decomposition level is represented as l, and the minimum fundamental frequency of the whistle signal is

f_{M}

. As shown in Figure 1, the choice of wavelet basis affects frequency-band aliasing, with some overlap observed between wavelet coefficients at different levels. Processing with the Daubechies wavelet reduces frequency-band overlap, and this reduction improves as the number of vanishing moments increases. Compared to other wavelet bases in the wavelet family, Daubechies wavelets offer a greater range of vanishing moments, allowing for more flexible selection based on signal-processing requirements. This property effectively minimizes mutual interference between wavelet coefficients at different levels during multi-level denoising. The vanishing moment is a key parameter that determines the accuracy of wavelet functions in approximating or estimating polynomials. A higher vanishing moment enhances wavelet compactness, effectively filtering redundant information but can cause overfitting if excessively high. A moderate vanishing moment is suitable for achieving a balance between accuracy and generalization in processing whistle signals.

This section details the proposed SI-ACF denoising method, designed for whistle signals’ fundamental frequency under low SNR conditions. SWT decomposition offers a multiscale time–frequency framework for whistle signal denoising. Building on this, the SI-ACF method employs two strategies to reduce noise: (1) a multi-level sliding window method detects impulsive discontinuities and suppresses impulsive noise in wavelet detail coefficients; (2) an efficient threshold search leverages whistle signal correlation to reduce non-impulsive.

3.1. Threshold Search Algorithm Utilizing Signal Correlation

Distinguishing whistle signals from non-impulsive noise is crucial for threshold selection, as it directly influences the balance preserving whistle signal integrity and removing noise. Wavelet decomposition divides whistle signals into components spanning multiple frequency bands. Wavelet decomposition not only separates whistle signals from noise but also provides a multiscale analysis of the differences between signals and noise, aiding in threshold selection. The complexity of underwater acoustic channels causes significant variations in noise across frequency bands and time intervals, necessitating appropriate thresholds for wavelet denoising at each level. Dynamic threshold setting is essential, considering the magnitude of wavelet coefficients at each level and the signal’s characteristics.

Building on the correlation analysis between whistle signals and typical ocean noise in Section 2.2, we propose a wavelet threshold denoising metric

ρ

that does not require prior information. The metric is based solely on the correlation characteristics of whistle signals. The metric, defined through the autocorrelation function in Equation (5), is expressed in Equation (7):

\begin{matrix} ρ = \frac{R (k_{p e a k})}{R (0)}, & k_{p e a k} = max \{k | k > m, R (m) = 0, 0 < m < k\} \end{matrix}

(7)

m represents the position where

R (k)

first reaches zero, with

ρ

ranging from (0, 1). The metric evaluates the overall correlation of the signal, reflecting the strength of its periodic components. It also characterizes the correlation features of the whistle signal’s fundamental frequency. In Figure 3, red dots represent the

ρ

metric.

Wavelet denoising removes high-frequency noise by processing detail coefficients while preserving approximation coefficients to maintain the signal’s main structure. However, due to the limited frequency range of the whistle signals, the approximation coefficients at the final level of the wavelet decomposition may not contain components of the whistle signals. Low-frequency wavelet coefficients primarily represent the signal’s overall trend and main structure. In wavelet level-dependent processing, unprocessed wavelet coefficients can interfere with the results of current-level processing, regardless of whether the sequence moves from high to low or low to high frequencies. Such interference may impact the stability of correlation-based denoising methods. To ensure reliability, the overall process of the proposed SI-ACF denoising method is shown in Figure 6.

The SI-ACF method for SWT-based multi-level denoising of whistle signals involves three steps:

(1): First, Equation (6) is used to determine the wavelet decomposition level l, preventing over-decomposition of wavelet sub-bands without whistle coefficients and avoiding excessive processing;
(2): Next, impulsive noise is suppressed by processing detail coefficients at each level using a sliding window, where detection is based on the segmented standard deviation of wavelet coefficients in descending order within the window; this approach reduces removal bias and selects the threshold based on neighboring amplitude levels, ensuring signal continuity without affecting other coefficients;
(3): Finally, wavelet coefficient thresholds for processing non-impulsive noise at each level are precisely selected based on the correlation metric $ρ$ from Equation (7), while an optimized processing sequence minimizes coefficient interference and enhances the stability of metric $ρ$ .

The detailed steps of the wavelet level-dependent threshold denoising method for processing non-impulsive noise using the correlation metric

ρ

are outlined below.

To minimize interference between wavelet coefficients across levels, the correlation metric

ρ

is used to determine whether the approximation coefficients from the last decomposition level, as defined in Equation (6), contain whistle signal components. Performing l-level SWT decomposition yields a set of wavelet coefficients,

\{D_{1}, D_{2}, . . ., D_{l}, A_{l}\}

. Next, the correlation results

P_{1} = \{ρ_{D_{1}}, ρ_{D_{2}}, . . ., ρ_{D_{l}}, ρ_{A_{l}}\}

are calculated for signals reconstructed from individual wavelet coefficients. Similarly,

P_{2} = \{ρ_{D_{1}}^{'}, ρ_{D_{2}}^{'}, . . ., ρ_{D_{l}}^{'}, ρ_{A_{l}}^{'}\}

represents the correlation results after excluding individual coefficients from the complete set. Based on this correlation analysis, approximation coefficients are processed using Equation (8) to improve the stability of the denoising process. The variable w identifies the detail coefficient level that contains the primary components of the whistle signal. Within

P_{1}

, the detail coefficient corresponding to the maximum correlation value is defined as

D_{w}

, representing the whistle signal’s most correlated component.

\begin{matrix} A_{l} = \{\begin{matrix} 0, & i f w \neq l, w \neq l - 1, ρ_{A_{l}} \neq max \{P_{1}\}, p_{A_{l}}^{'} \neq min \{P_{2}\}, \\ A_{l}, & o t h e r w i s e . \end{matrix} \end{matrix}

(8)

If component

A_{l}

is fully retained in the previous step, it should be prioritized during the second phase of level-dependent threshold denoising based on the correlation method. For wavelet coefficients’ processing at each level, a specific threshold exists that maximizes the whistle signal correlation metric

ρ

. As noise is progressively removed,

ρ

increases; however, when the effective signal is compromised,

ρ

starts to decrease. For the threshold determination method based on signal correlation, take the detail coefficients

D_{i}

(where

i \neq w

) from the wavelet decomposition as an example. The range

[0, m a x |D_{i}|]

is selected as the optimal threshold search interval. The trisection method is used to identify the threshold corresponding to the maximum p-value within this range, which is then applied to process the detail coefficients at the current level.

The final step focuses on processing

D_{w}

. Since

D_{w}

is crucial for the whistle signal’s overall correlation, removing noise coefficients in the same frequency band can significantly influence the correlation. As noise gradually decreases, if part of the whistle signal is disrupted, the remaining segments may exhibit a higher correlation. Thus, redefining the threshold range is essential for optimizing denoising performance. According to [43], dolphin whistle signals typically last longer than 0.1 seconds. During the processing of detail coefficients

D_{w}

, the signal is segmented with a window length of

0.1 \cdot f_{s}

. Absolute values within each window are averaged to identify the minimum average value (

M_{m i n}

) and the overall mean value (

M_{m e a n}

) of all windows. The trisection method is then applied within the threshold range to find the threshold that maximizes the correlation metric

ρ

, where

M_{H} = M_{m e a n} - (M_{m e a n} - M_{min}) / 2

and

M_{L} = M_{min} / 2

, as shown in Figure 7. Finally, the optimal threshold is used to process

D_{w}

, and the processed wavelet coefficients are reconstructed to yield the denoised signal.

As illustrated in Figure 8 for processing the detail coefficient

D_{i}

, the specific steps for searching the optimal threshold using the trisection method are as follows:

Define the threshold search range as [ $a_{1}$ , $d_{1}$ ], where $a_{1} = 0$ and $d_{1} = m a x |D_{i}|$ . Calculate the correlation result $ρ (a_{1})$ when the threshold is set to $a_{1}$ , which corresponds to the scenario where the current level of detail coefficients is unprocessed. Similarly, calculate the correlation result $ρ (d_{1})$ when the threshold is set to $d_{1}$ , representing the scenario where the current level of detail coefficients is entirely removed;
The search interval [ $a_{k}$ , $d_{k}$ ] is iteratively narrowed using the trisection algorithm. For the k-th iteration, the trisection algorithm selects $b_{k} = a_{k} + \frac{1}{3} (d_{k} - a_{k})$ and $c_{k} = d_{k} - \frac{1}{3} (d_{k} - a_{k})$ , and calculates the corresponding correlation results $ρ (b_{k})$ and $ρ (c_{k})$ ;
Compare the values of $ρ (a_{k})$ , $ρ (b_{k})$ , $ρ (c_{k})$ and $ρ (d_{k})$ . If $ρ (b_{k})$ is the maximum, update the threshold search interval to [ $a_{k}$ , $c_{k}$ ]; if $ρ (c_{k})$ is the maximum, update the interval to [ $b_{k}$ , $d_{k}$ ]; if $ρ (a_{k})$ or $ρ (d_{k})$ is the maximum, update the interval to [ $a_{k}$ , $b_{k}$ ] or [ $c_{k}$ , $d_{k}$ ], respectively;
After each update of the threshold interval, k is incremented by 1, and the next iteration is performed until the termination condition in Equation (9) is satisfied:

$|d_{k} - a_{k}| < 10^{- 6}$

(9)

At this point, the threshold corresponding to the maximum value of the correlation metric $ρ$ within the threshold interval (as indicated by the peak point in Figure 8) is determined to be the optimal threshold for processing the detail coefficient $D_{i}$ .

3.2. Suppression Impulsive Noise

Dolphin whistle signals recorded by hydrophones often contain random impulsive noise, as shown in Figure 2 and Figure 5. This noise may originate from the environment or dolphin click signals. Its presence interferes with extracting the characteristic parameters of whistle signals.

Figure 5 shows that the amplitude distribution of wavelet coefficients reveals impulsive noise does not maintain high amplitudes across the entire frequency band. Using the signal in Figure 2 as an example, the optimal-level wavelet decomposition based on Equation (6) was applied, and all detail coefficients were rearranged in descending order. The reordered curves of detail coefficients across all levels show a consistent trend. Figure 9 shows the amplitude distribution of detail coefficients

D_{3}

, while Figure 10 depicts the curve after rearrangement in descending order. Abrupt high-amplitude variations at the curve’s ends are concentrated in very short intervals, corresponding to wavelet coefficients of impulsive noise. Noise regions marked in red in Figure 9 appear as flat, low-amplitude areas in Figure 10, representing the main components of non-impulsive noise. In summary, the primary wavelet coefficients of whistle signals differ from noise coefficients in amplitude. Combined with the amplitude distribution of wavelet coefficients in Figure 5, this feature offers a clear basis for distinguishing and removing both impulsive and non-impulsive noise.

Impulsive noise shows differences between low and high amplitude values, as illustrated in Figure 9. Selecting the threshold based on the overall descending order of detail coefficients at each level may fail to remove low-amplitude impulsive noise effectively. To detect impulsive noise coefficients and set appropriate thresholds, this subsection introduces a sliding window method for the threshold processing of detail coefficients at each level, as shown in Figure 11. The sliding window method mitigates removal biases caused by amplitude differences between low- and high-amplitude impulsive noise coefficients.

The proposed impulsive noise suppression method focuses on identifying significant magnitude change points in the descending-order curve, represented as the positive and negative magnitude parts in Figure 10, denoted by

t_{j}^{+}

and

t_{j}^{-}

. Change points within each sliding window are detected using the MATLAB R2023b

f i n d c h a n g e p t s

function [44,45]. This function detects change points

i p t

within each window by evaluating segment differences and maximizing standardized variance differences between segments. For a window of length

L_{N}

, the method divides the positive magnitude part of the descending-order curve into two intervals:

[1, i p t - 1]

and

[i p t, L_{N} / 2]

, using the positive magnitude part as an example. The change point

i p t

and the corresponding threshold

t_{j, ℓ}^{+}

are calculated based on the standard deviation differences between these intervals. The same procedure is applied to the negative magnitude part to calculate the threshold

t_{j, ℓ}^{-}

.

t_{j, ℓ}^{+}

and

t_{j, ℓ}^{-}

denote the thresholds for the positive and negative sections of window ℓ at level j detail coefficients, respectively. Finally, impulsive noise coefficients within the window are eliminated based on Equation (10):

D_{j} (n) = \{\begin{matrix} μ_{j, ℓ}^{+}, & D_{j} (n) > t_{j, ℓ}^{+}, \\ μ_{j, ℓ}^{-}, & D_{j} (n) < t_{j, ℓ}^{-}, \\ D_{j} (n), & otherwise, \end{matrix} \forall n \in W_{j, ℓ}

(10)

W_{j, ℓ}

represents the ℓ window of the level j detail coefficients,

\begin{matrix} W_{j, ℓ} = \{D_{j} (n) ∣ ℓ \cdot S \leq n < ℓ \cdot S + L\}, & ℓ = 0, 1, . . ., K - 1, \end{matrix}

(11)

L represents the window length, set to 10% of the signal sampling rate. S denotes the step size, equal to half of the window length. K represents the total number of windows.

μ_{j, ℓ}^{+}

and

μ_{j, ℓ}^{-}

represent the mean values of all positive and negative numbers within the window, respectively, as shown in Equations (12) and (13).

\begin{matrix} \begin{matrix} μ_{j, ℓ}^{+} = \frac{1}{|W_{j, ℓ}^{+}|} \sum_{n \in W_{j, ℓ}^{+}} D_{j} (n), & W_{j, ℓ}^{+} = \{D_{j} (n) \in W_{j, ℓ} ∣ D_{j} (n) > t_{j, ℓ}^{+}\} \end{matrix} \end{matrix}

(12)

\begin{matrix} \begin{matrix} μ_{j, ℓ}^{-} = \frac{1}{|W_{j, ℓ}^{-}|} \sum_{n \in W_{j, ℓ}^{-}} D_{j} (n), & W_{j, ℓ}^{-} = \{D_{j} (n) \in W_{j, ℓ} ∣ D_{j} (n) < t_{j, ℓ}^{-}\} \end{matrix} \end{matrix}

(13)

Unlike median filtering, the proposed method prevents energy dispersion in whistle signals caused by signal averaging. Compared to clipping and wave-shaping, the sliding window with an adaptive threshold modulation strategy selectively removes impulsive noise coefficients. This method improves flexibility and accuracy while maintaining the integrity and continuity of the whistle signal.

4. Results and Discussion

To comprehensively evaluate the proposed SI-ACF method’s denoising performance, validation experiments were conducted using both simulated and real whistle signals. Simulated whistle signals were constructed using chirp-like signals, allowing the generation of different whistle types, with Gaussian envelopes applied to further mimic the frequency and amplitude variations characteristic of real dolphin whistles. Simulated whistle signals provide a clean reference signal, making them suitable for quantifying denoising effectiveness under various noise levels and types. In contrast, real whistle signals lack a noise-free reference, making it difficult to compare numerical results.

For the simulated whistle signal

x (n)

, three types of noise

d (n)

(vessel noise, wind and wave noise, and background noise) measured in Xiamen Bay were used. These three typical noise types were selected based on the different frequency ranges and sound-pressure spectrum levels of ocean noise summarized in the Wenz curve [46], as well as commonly encountered and representative noise in real-world scenarios. Based on Equation (14), the simulated whistle signal was combined with these noise types at varying SNR levels to generate noisy signals. The synthesized signals were processed using the thresholding rule and denoising strategy of the proposed SI-ACF method, along with the Sqtwolog, Rigrsure, Heursure, Minimaxi, and Bayes thresholds [47], to obtain the denoised signals

\tilde{x} (n)

. The output SNR was calculated using Equation (15) to quantify the quality of the denoised signal. Furthermore, the normalized root mean square error (NRMSE) from Equation (16) was used to measure the difference between the denoised and reference signals, while the Pearson correlation coefficient (PCC) from Equation (17) assessed their degree of linear association.

\begin{matrix} S N R_{i n} = 10 {log}_{10} \frac{\sum_{k = 1}^{n} x^{2} (k)}{\sum_{k = 1}^{n} d^{2} (k)} \end{matrix}

(14)

\begin{matrix} S N R_{o u t} = 10 {log}_{10} \frac{\sum_{k = 1}^{n} x^{2} (k)}{\sum_{k = 1}^{n} {(x (k) - \tilde{x} (k))}^{2}} \end{matrix}

(15)

\begin{matrix} N R M S E = \frac{\sqrt{\frac{1}{n} \sum_{k = 1}^{n} {(x (k) - \tilde{x} (k))}^{2}}}{max (x (n)) - min (x (n))} \end{matrix}

(16)

\begin{matrix} r = \frac{cov (x, \tilde{x})}{σ_{x} σ_{\tilde{x}}} \end{matrix}

(17)

c o v (x, \tilde{x})

represents the covariance between the reference signal and the denoised result.

σ_{x}

and

σ_{\tilde{x}}

represent the standard deviations of these signals, respectively. Additionally, the metric

ρ

is calculated to evaluate the denoised results. Combined with other evaluation metrics, offers additional validation for the effectiveness of the proposed method.

4.1. Denoising Results for Simulated Whistles with Three Typical Noise Types

The proposed SI-ACF method is designed to preprocess and denoise dolphin whistle signals in underwater scenarios with low SNR conditions. To ensure the reliability and generalizability of our findings, the denoising performance was evaluated across multiple sets of simulated different types of whistle signals combined with different noise conditions. Synthetic whistle signals with input SNR values from −20 to 0 dB were used as experimental subjects in the simulation experiments. The simulated whistle signals adopted in subsequent analyses were configured with a duration of 0.8 s, a sampling rate of 96 kHz, and a frequency range of 6.5–9 kHz. To ensure fair result comparisons, all thresholding denoising methods were applied under consistent SWT decomposition conditions. Specifically, the db20 wavelet basis, decomposition levels determined by Equation (6), and soft thresholding were applied. Additionally, all wavelet coefficients, including the final level’s approximation coefficients, were included in the processing.

4.1.1. Vessel Noise

Ship noise is a common anthropogenic noise source in underwater environments, and its underwater radiated noise typically includes mechanical noise, propeller and flow noise, and noise generated by cavitation effects [48]. Among these, propeller cavitation is the primary source of ship underwater radiated noise [49], and as observed from the Wenz curve, the spectral range of this broadband noise overlaps with the communication frequency range of dolphin whistles.

Figure 12a shows that the SI-ACF method significantly outperforms other thresholding methods under low SNR conditions. For input SNRs between −18 dB and 0 dB, it achieves over 5 dB improvement compared to Sqtwolog and Minimaxi, and about 3 dB better performance than Bayes and Heursure. At −20 dB, it performs slightly worse than Bayes and Heursure. Figure 12b supports the same conclusion. The Sqtwolog and Minimaxi methods, which rely on noise estimation for threshold selection, struggle in low-SNR underwater environments due to significant time-varying noise variations. This limitation primarily stems from the noise statistics estimation based on global wavelet coefficients that average out the temporal differences in noise characteristics. Such averaging leads to an inaccurate noise-level estimation, which in turn biases the threshold selection and degrades denoising performance compared to the SI-ACF method, which adaptively selects thresholds by leveraging the whistle signal’s correlation characteristics across the entire time segment to dynamically distinguish it from noise.

Figure 13a,b show the results of two correlation metrics, demonstrating that the SI-ACF method consistently achieves higher correlation than other thresholding methods. This differs from the results in Figure 12 under −20 dB conditions, likely due to the enhancement of ship spectral line noise, a type of narrowband noise dominated by low-frequency fundamental tones and their harmonics, typically generated by periodic mechanical vibrations such as engine rotation or propeller cavitation. This narrowband noise appears in the frequency domain as regular peaks (such as fundamental and harmonic components), which introduces bias in the threshold selection for the correlation metric

ρ

. Low-frequency noise with a trend can influence the threshold selection of the correlation metric

ρ

, leading to a reduction in denoising performance. Nevertheless, Figure 13 shows that the SI-ACF method surpasses other thresholding techniques in preserving whistle signal integrity, as evidenced by both the PCC and

ρ

metric.

4.1.2. Wind and Wave Noise

In the low-frequency range of typical underwater spectra, ship traffic noise dominates, while above this frequency range, wind-driven wave sounds become predominant [50]. The size of wind waves is determined by three factors: wind strength, wind duration, and fetch (i.e., the distance over which the wind blows across uninterrupted water surface in a constant direction).

Figure 14 shows that the SI-ACF method consistently outperforms other methods, while Sqtwolog and Minimaxi exhibit the poorest performance. The Bayes thresholding method struggles to select the optimal threshold based on the prior distribution of wind and wave noise coefficients, leading to poorer denoising performance in this noise scenario compared to vessel noise scenarios. The primary frequency bands of wind and wave noise substantially overlap with the whistle signals’ fundamental frequency, complicating the threshold selection for the Bayes method. Conversely, vessel noise is predominantly confined to low-frequency ranges, facilitating more accurate threshold estimation for the Bayes approach under these conditions. In terms of output SNR and NRMSE, the SI-ACF method demonstrates stable denoising performance under wave noise conditions. Wind and wave noise exhibit non-Gaussian, non-stationary characteristics, with a time-varying spectral energy distribution driven by environmental dynamics. These noise properties result in the more severe degradation of whistle signals within overlapping frequency bands; however, the SI-ACF method reliably overcomes these challenges.

Figure 15a shows that the SI-ACF method achieves a PCC consistently above 0.8, indicating a high level of linear correlation with the reference whistle signal in the denoised results. This high correlation is further corroborated by the proposed correlation metric p, as shown in Figure 15b, which attains values above 0.69 across all tested SNR levels. In contrast, the Sqtwolog and Minimaxi methods both have PCC values below 0.3 and metric

ρ

-values below 0.35 within the −20 dB to −16 dB SNR range. Such low correlations indicate that these conventional methods over-suppress valid signal components during denoising, severely compromising the integrity of the whistle structure. The SI-ACF method employs correlation to differentiate between noise-dominant and signal-dominant regions, thereby overcoming the excessive denoising.

4.1.3. Background Noise

Background noise is composed of a mixture of various noise sources in the marine environment, reflecting the complex and dynamic acoustic conditions underwater. These sources include a combination of natural and anthropogenic components.

Figure 16 shows that the SI-ACF method significantly outperforms other thresholding methods in both output SNR and NRMSE. Unlike the other two noise types, background noise not only comprises low-frequency components and frequency bands that overlap with whistle signals but also includes impulsive noise. The complexity of these noise components results in smaller improvements in output SNR and NRMSE compared to those observed with the other two types of noise. Nevertheless, the SI-ACF thresholding method effectively balances the removal of both impulsive and non-impulsive noise characteristics. Additionally, differences among other thresholding methods are minor, with only Sqtwolog and Minimaxi performing the worst within the −12 dB to 0 dB range.

Background noise more accurately represents the environment where the whistle signal is captured. Figure 17 shows that the PCC and the proposed correlation metric

ρ

demonstrate the SI-ACF method’s superior ability to preserve the whistle signal’s main structure during denoising. Figure 17a shows that other thresholding methods cannot effectively separate noise from the whistle signal, leading to a low linear correlation with the reference signal at SNRs below −16 dB. Figure 17b presents the correlation assessment of the denoised results using the proposed metric

ρ

, revealing that other methods significantly degrade the whistle signal’s main structure, resulting in lower

ρ

-values.

Table 1 summarizes the denoising performance at −10 dB input SNR across three noise types. The numerical results clearly show that the denoising performance of each method deteriorates under background noise conditions, as the shallow sea background noise in the inactive neighborhood of the whistle signal is more complex, with more dramatic variations in the same frequency band. However, SI-ACF still maintains the best denoising capability, effectively distinguishing whistle content that has been obscured by noise, and leveraging the high energy of impulse noise and the inherent correlation characteristics of the whistle signal. In contrast, noise estimation or wavelet coefficient characteristic-based thresholding methods fail to accurately determine the threshold under conditions where both noise and whistle signals vary in the time–frequency domain.

4.2. Denoising Results for Real Whistles

To denoise real dolphin whistle signals, we utilized a dataset from Fremantle Inner Harbour [6] and analyzed the processed signals using PAMGuard software (version 2.02.14) [51]. This whistle detector identifies whistles by searching for the energy ridges in the sound. However, the energy level of the whistle signals is weak in this dataset, which increases the difficulty of detecting them. Consequently, 40 audio files containing 103 whistle signals, representing various whistle structures, were manually selected from the dataset using Raven Pro 1.6 software (Cornell Laboratory of Ornithology, Ithaca, NY, USA) for the experiment. The PAMGuard configuration is detailed in Table 2.

Due to the complex ambient noise in this dataset, other thresholding methods based on level-dependent processing are prone to erroneously deleting valid signals. Therefore, the best results from level-independent or level-dependent methods among these methods are selected for comparison. Figure 18 shows the whistle signal Wh_0052, which demonstrates that the whistle signal is severely interfered with by impulsive noise and contains ship spectral line noise.

Figure 19 compares the time–frequency spectrograms of denoised results using various thresholding methods, only the proposed SI-ACF method accurately identifies the whistle signal’s frequency band and removes irrelevant wavelet coefficients, while other methods retained some irrelevant coefficients. Furthermore, during denoising within the same frequency band, the SI-ACF method avoids over-denoising, preserving more whistle energy and enhancing the visualization of the whistle signal in the time–frequency spectrogram. Noise estimation-based methods prioritize noise reduction over preserving valid signals, leading to imbalanced threshold determination. Among other thresholding methods, except for Bayes and Minimaxi, whistle signal components are visibly weakened. Bayes and Minimaxi retain more noise because level-dependent methods tend to over-denoise, while level-independent methods are insufficient for suppressing most noise. While impulsive noise coefficients minimally affect numerical metrics such as output SNR and NRMSE, the time–frequency spectrogram distinctly demonstrates the SI-ACF method’s superior ability to remove these coefficients. Furthermore, the second whistle in Figure 18 exhibits a breakpoint at 2.5 s. It is evident that the SI-ACF method does not enlarge this gap during denoising, whereas other excessive denoising methods exacerbate it, potentially causing the whistle signal to be identified as two separate signals.

When detecting whistle contours with PAMGuard, a single whistle signal may be split into multiple signals, noise might be misclassified as a whistle, and some whistles might be missed altogether. Using the 103 selected whistles as the reference set (denoted as

W_{t o t a l}

), we recorded the number of correctly identified whistles (

W_{c o r r e c t}

), incorrectly identified whistles (

W_{f a l s e}

), and missed whistles (

W_{m i s s e d}

). Since the frequency range of whistle detection was predefined and the whistle count was small, all methods showed minimal false detection. Moreover, PAMGuard’s configuration allows modifications to lower the false detection rate. The denoising performance was evaluated using Equation (18) for the effective detection rate (EDR) and Equation (19) for the missed detection rate (MDR).

E D R = \frac{W_{t o t a l} - W_{f a l s e} - W_{m i s s e d}}{W_{c o r r e c t} + W_{f a l s e}}

(18)

M D R = \frac{W_{m i s s e d}}{W_{t o t a l}}

(19)

The detection results without preprocessing of wavelet denoising, as shown by PAMGuard, are presented. As shown in Table 3, after denoising, the SI-ACF method achieved the highest effective detection rate and the lowest missed detection rate, and by effectively reducing noise, lowered the missed detection rate, likely by attenuating the surrounding noise of weak whistle signals, thus enhancing the energy components of the whistle signals. In contrast, the effective detection rate of the Sqtwolog method was lower than the results without wavelet preprocessing, while the Minimaxi method only improved the effective detection rate by 0.05% over the no-preprocessing case, further confirming that noise estimation threshold methods struggle to adapt to underwater noise environments. The SI-ACF method improved the EDR by nearly 14.78%, while methods such as Rigrsure, Heursure, and Bayes showed minimal improvement. This situation can be attributed to two main reasons: first, under low SNR, the weak energy at the starting, ending, and some turning points of the whistle signal resulted in some components of the whistle not being effectively detected, leading to single whistle signals being counted as multiple signals. Secondly, the presence of impulsive noise disrupted the continuity of the whistle signal in the time–frequency domain, also causing single whistle signals to be split. Therefore, removing impulsive noise is crucial for improving the accuracy of whistle detection. In summary, the SI-ACF wavelet threshold denoising method effectively removes irrelevant noise by applying precise thresholds, while preserving the integrity of the whistle signals and enhancing the effectiveness of automatic whistle detection.

5. Conclusions

This study proposes a wavelet thresholding denoising method based on signal and noise characteristic estimation (SI-ACF) for dolphin whistle fundamental frequency signals under low SNR underwater environments. The method is designed to replace traditional thresholding approaches that rely on noise-level estimation or wavelet coefficient features, enabling precise threshold determination while preserving the integrity of whistle signals. By utilizing the frequency range of dolphin whistles to determine and limit the decomposition levels in SWT, over-processing is effectively prevented, and the proposed method suppresses impulsive noise by incorporating amplitude levels from adjacent time segments, thereby preventing over-denoising. Furthermore, it leverages the correlation characteristics of whistle signals to achieve accurate signal estimation, effectively retaining signal components during noise suppression.

The SI-ACF method was validated using simulated whistles combined with three types of typical underwater noise and real whistle data. The results show that SI-ACF overcomes the limitations of noise-estimation-based wavelet denoising methods in time-varying underwater noise environments. However, the autocorrelation-based metric

ρ

is currently limited to processing the fundamental frequency of whistle signals. Under low-SNR conditions, accurately identifying harmonic components of whistle signals remains challenging, even for human operators. Furthermore, when noise contains coherent components, the SI-ACF method may yield insufficient denoising performance. The proposed method, however, shows strong potential for broader applications to other periodic or quasi-periodic signals. Future work will focus on adaptively optimizing the thresholding function to improve same-frequency denoising accuracy for nonlinear, non-stationary whistle signals in dynamic noise environments. This work not only improves the accuracy of wavelet-based denoising but also contributes to understanding how environmental variations influence bioacoustic behaviors and shape species-specific vocalization patterns, providing valuable insights into adaptive mechanisms and supporting conservation efforts.

Author Contributions

Conceptualization, X.Z. and P.Z.; methodology, W.C. and X.Z.; software, P.Z. and X.X.; validation, X.Z. and R.W.; formal analysis, M.D. and X.Z.; investigation, X.Z., M.D. and P.Z.; resources, P.Z. and X.X.; data curation, P.Z. and X.X.; writing—original draft preparation, X.Z.; writing—review and editing, X.Z., R.W. and P.Z; visualization, X.Z. and R.W.; supervision, W.C. and P.Z.; project administration, M.D. and P.Z.; funding acquisition, W.C. and P.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the Natural Science Foundation of Fujian Science and Technology Plan under Grant [2022J01824] and Xiamen Science and Technology Subsidy Project [No.2023CXY0304].

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data will be made available on request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

References

Baptista, G.; Kehrig, H.A.; Di Beneditto, A.P.M.; Hauser-Davis, R.A.; Almeida, M.G.; Rezende, C.E.; Siciliano, S.; de Moura, J.F.; Moreira, I. Mercury, selenium and stable isotopes in four small cetaceans from the Southeastern Brazilian coast: Influence of feeding strategy. Environ. Pollut. 2016, 218, 1298–1307. [Google Scholar] [CrossRef] [PubMed]
Hazen, E.L.; Abrahms, B.; Brodie, S.; Carroll, G.; Jacox, M.G.; Savoca, M.S.; Scales, K.L.; Sydeman, W.J.; Bograd, S.J. Marine top predators as climate and ecosystem sentinels. Front. Ecol. Environ. 2019, 17, 565–574. [Google Scholar] [CrossRef]
Herzing, D.L. Clicks, whistles and pulses: Passive and active signal use in dolphin communication. Acta Astronaut. 2014, 105, 534–537. [Google Scholar] [CrossRef]
Rege-Colt, M.; Oswald, J.N.; De Weerdt, J.; Palacios-Alfaro, J.D.; Austin, M.; Gagne, E.; Morán Villatoro, J.M.; Sahley, C.T.; Alvarado-Guerra, G.; May-Collado, L.J. Whistle repertoire and structure reflect ecotype distinction of pantropical spotted dolphins in the Eastern Tropical Pacific. Sci. Rep. 2023, 13, 13449. [Google Scholar] [CrossRef] [PubMed]
Wang, X.y.; Jiang, Y.; Liu, Z.w.; Yang, C.m.; Chen, B.y.; Lü, L.g. Three types of pulsed signal trains emitted by Indo-Pacific humpback dolphins (Sousa chinensis) in Beibu Gulf, South China Sea. Front. Mar. Sci. 2022, 9, 915668. [Google Scholar] [CrossRef]
Marley, S.A.; Erbe, C.; Kent, C.P.S. Underwater recordings of the whistles of bottlenose dolphins in Fremantle Inner Harbour, Western Australia. Sci. Data 2017, 4, 170126. [Google Scholar] [CrossRef]
Sayigh, L.S.; Janik, V.M.; Jensen, F.H.; Scott, M.D.; Tyack, P.L.; Wells, R.S. The Sarasota Dolphin Whistle Database: A unique long-term resource for understanding dolphin communication. Front. Mar. Sci. 2022, 9, 923046. [Google Scholar] [CrossRef]
Yuan, J.; Wang, Z.; Duan, P.; Xiao, Y.; Zhang, H.; Huang, Z.; Zhou, R.; Wen, H.; Wang, K.; Wang, D. Whistle signal variations among three Indo-Pacific humpback dolphin populations in the South China Sea: A combined effect of the Qiongzhou Strait’s geographical barrier function and local ambient noise? Integr. Zool. 2021, 16, 499–511. [Google Scholar] [CrossRef]
Dong, L.; Caruso, F.; Lin, M.; Liu, M.; Gong, Z.; Dong, J.; Cang, S.; Li, S. Whistles emitted by Indo-Pacific humpback dolphins (Sousa chinensis) in Zhanjiang waters, China. J. Acoust. Soc. Am. 2019, 145, 3289–3298. [Google Scholar] [CrossRef]
Perez-Ortega, B.; Daw, R.; Paradee, B.; Gimbrere, E.; May-Collado, L.J. Dolphin-watching boats affect whistle frequency modulation in bottlenose dolphins. Front. Mar. Sci. 2021, 8, 618420. [Google Scholar] [CrossRef]
Li, C.; Jiang, J.; Wang, X.; Sun, Z.; Li, Z.; Fu, X.; Duan, F. Bionic covert underwater communication focusing on the overlapping of whistles and clicks generated by different cetacean individuals. Appl. Acoust. 2021, 183, 108279. [Google Scholar] [CrossRef]
Jiang, J.; Yao, Z.; Li, Z.; Lu, Y.; Yao, Q.; Gong, X.; Fu, X.; Duan, F. Recognition method for the bionic camouflage cetacean whistle modulated by CPMFSK signals. Appl. Acoust. 2023, 207, 109326. [Google Scholar] [CrossRef]
Mattmüller, R.M.; Thomisch, K.; Van Opzeeland, I.; Laidre, K.L.; Simon, M. Passive acoustic monitoring reveals year-round marine mammal community composition off Tasiilaq, Southeast Greenland. J. Acoust. Soc. Am. 2022, 151, 1380–1392. [Google Scholar] [CrossRef] [PubMed]
Findlay, C.R.; Rojano-Doñate, L.; Tougaard, J.; Johnson, M.P.; Madsen, P.T. Small reductions in cargo vessel speed substantially reduce noise impacts to marine mammals. Sci. Adv. 2023, 9, eadf2987. [Google Scholar] [CrossRef]
Serra, O.M.; Martins, F.; Padovese, L.R. Active contour-based detection of estuarine dolphin whistles in spectrogram images. Ecol. Inform. 2020, 55, 101036. [Google Scholar] [CrossRef]
Kipnis, D.; Diamant, R. Graph-based clustering of dolphin whistles. IEEE ACM Trans. Audio Speech Lang. Process. 2021, 29, 2216–2227. [Google Scholar] [CrossRef]
Li, P.; Roch, M.A.; Klinck, H.; Fleishman, E.; Gillespie, D.; Nosal, E.M.; Shiu, Y.; Liu, X. Learning stage-wise gans for whistle extraction in time-frequency spectrograms. IEEE Trans. Multimed. 2023, 25, 9302–9314. [Google Scholar] [CrossRef]
Mallawaarachchi, A.; Ong, S.; Chitre, M.; Taylor, E. Spectrogram denoising and automated extraction of the fundamental frequency variation of dolphin whistles. J. Acoust. Soc. Am. 2008, 124, 1159–1170. [Google Scholar] [CrossRef]
Gruden, P.; White, P.R. Automated extraction of dolphin whistles—A sequential Monte Carlo probability hypothesis density approach. J. Acoust. Soc. Am. 2020, 148, 3014–3026. [Google Scholar] [CrossRef]
Gruden, P.; White, P.R. Automated tracking of dolphin whistles using Gaussian mixture probability hypothesis density filters. J. Acoust. Soc. Am. 2016, 140, 1981–1991. [Google Scholar] [CrossRef]
Beale, C.; Niezrecki, C.; Inalpolat, M. An adaptive wavelet packet denoising algorithm for enhanced active acoustic damage detection from wind turbine blades. Mech. Syst. Signal Process. 2020, 142, 106754. [Google Scholar] [CrossRef]
Alsalah, A.; Holloway, D.; Mousavi, M.; Lavroff, J. Identification of wave impacts and separation of responses using EMD. Mech. Syst. Signal Process. 2021, 151, 107385. [Google Scholar] [CrossRef]
Li, J.; Chen, Y.; Qian, Z.; Lu, C. Research on VMD based adaptive denoising method applied to water supply pipeline leakage location. Measurement 2020, 151, 107153. [Google Scholar] [CrossRef]
Yao, Q.; Wang, Y.; Yang, Y. Underwater acoustic target recognition based on Hilbert–Huang transform and data augmentation. IEEE Trans. Aerosp. Electron. Syst. 2024, 60, 7336–7353. [Google Scholar] [CrossRef]
Luo, Z.; Ding, S.; Tan, C.; Xu, B.; Lu, B.; Huang, J. Low-frequency fiber optic hydrophone based on ultra-weak fiber Bragg grating. IEEE Sens. J. 2023, 23, 11635–11641. [Google Scholar] [CrossRef]
Lindenbaum, O.; Steinerberger, S. Refined least squares for support recovery. Signal Process. 2022, 195, 108493. [Google Scholar] [CrossRef]
Lara, G.; Bou-Cabo, M.; Llorens, S.; Miralles, R.; Espinosa, V. Acoustical behavior of delphinid whistles in the presence of an underwater explosion event in the Mediterranean coastal waters of spain. J. Mar. Sci. Eng. 2023, 11, 780. [Google Scholar] [CrossRef]
Kragh, I.M.; McHugh, K.; Wells, R.S.; Sayigh, L.S.; Janik, V.M.; Tyack, P.L.; Jensen, F.H. Signal-specific amplitude adjustment to noise in common bottlenose dolphins (Tursiops truncatus). J. Exp. Biol. 2019, 222, jeb216606. [Google Scholar] [CrossRef]
Kumar, A.; Tomar, H.; Mehla, V.K.; Komaragiri, R.; Kumar, M. Stationary wavelet transform based ECG signal denoising method. ISA Trans. 2021, 114, 251–262. [Google Scholar] [CrossRef]
Kozhamkulova, F.; Akhtar, M.T. A Hybrid Approach to Enhanced Signal Denoising Using Data-Driven Multiresolution Analysis with Detrended-Fluctuation-Analysis-Based Thresholding and Stationary Wavelet Transform. Appl. Sci. 2024, 14, 10866. [Google Scholar] [CrossRef]
Zhou, S.; Zhang, Z.X.; Luo, X.; Niu, S.; Jiang, N.; Yao, Y. Developing a hybrid CEEMDAN-PE-HE-SWT method to remove the noise of measured carbon dioxide blast wave. Measurement 2023, 223, 113797. [Google Scholar] [CrossRef]
Bach, N.H.; Vu, L.H.; Nguyen, V.D.; Pham, D.P. Classifying marine mammals signal using cubic splines interpolation combining with triple loss variational auto-encoder. Sci. Rep. 2023, 13, 19984. [Google Scholar] [CrossRef]
Li, D.; Xu, Z.; Ostachowicz, W.; Cao, M.; Liu, J. Identification of multiple cracks in noisy conditions using scale-correlation-based multiscale product of SWPT with laser vibration measurement. Mech. Syst. Signal Process. 2020, 145, 106889. [Google Scholar] [CrossRef]
Hossain, M.I.; Islam, M.S.; Khatun, M.T.; Ullah, R.; Masood, A.; Ye, Z. Dual-transform source separation using sparse nonnegative matrix factorization. Circuits Syst. Signal Process. 2021, 40, 1868–1891. [Google Scholar] [CrossRef]
Yang, J.; Riser, S.; Thorsos, E.I. Open ocean ambient noise data in the frequency band of 100 Hz–50 kHz from the Pacific Ocean: A legacy of Jeffrey A. Nystuen. J. Acoust. Soc. Am. 2023, 153, A134. [Google Scholar] [CrossRef]
Wang, J.; Li, J.; Yan, S.; Shi, W.; Yang, X.; Guo, Y.; Gulliver, T.A. A novel underwater acoustic signal denoising algorithm for Gaussian/non-Gaussian impulsive noise. IEEE Trans. Veh. Technol. 2020, 70, 429–445. [Google Scholar] [CrossRef]
Lucke, K.; MacGillivray, A.O.; Halvorsen, M.B.; Ainslie, M.A.; Zeddies, D.G.; Sisneros, J.A. Recommendations on bioacoustical metrics relevant for regulating exposure to anthropogenic underwater sound. J. Acoust. Soc. Am. 2024, 156, 2508–2526. [Google Scholar] [CrossRef]
Urick, R.J. Principles of Underwater Sound; McGraw-Hill: New York, NY, USA, 1983. [Google Scholar]
Wen, C.S.; Lin, C.F.; Chang, S.H. EMD-Based Energy Spectrum Entropy Distribution Signal Detection Methods for Marine Mammal Vocalizations. Sensors 2023, 23, 5416. [Google Scholar] [CrossRef]
Siddagangaiah, S.; Chen, C.F.; Hu, W.C.; Akamatsu, T.; McElligott, M.; Lammers, M.O.; Pieretti, N. Automatic detection of dolphin whistles and clicks based on entropy approach. Ecol. Indic. 2020, 117, 106559. [Google Scholar] [CrossRef]
Cascão, I.; Lammers, M.O.; Prieto, R.; Santos, R.S.; Silva, M.A. Temporal patterns in acoustic presence and foraging activity of oceanic dolphins at seamounts in the Azores. Sci. Rep. 2020, 10, 3610. [Google Scholar] [CrossRef]
Mallat, S.G. A theory for multiresolution signal decomposition: The wavelet representation. IEEE Trans. Pattern Anal. Mach. Intell. 1989, 11, 674–693. [Google Scholar] [CrossRef]
Gridley, T.; Elwen, S.H.; Rashley, G.; Badenas Krakauer, A.; Heiler, J. Bottlenose dolphins change their whistling characteristics in relation to vessel presence, surface behavior and group composition. Proc. Meet. Acoust. 2016, 27, 010030. [Google Scholar]
Killick, R.; Fearnhead, P.; Eckley, I.A. Optimal detection of changepoints with a linear computational cost. J. Am. Stat. Assoc. 2012, 107, 1590–1598. [Google Scholar] [CrossRef]
Lavielle, M. Using penalized contrasts for the change-point problem. Signal Process. 2005, 85, 1501–1510. [Google Scholar] [CrossRef]
Zhao, X.; Xia, H.; Zhao, J.; Zhou, F. Adaptive wavelet threshold denoising for bathymetric laser full-waveforms with weak bottom returns. IEEE Geosci. Remote Sens. Lett. 2022, 19, 1–5. [Google Scholar] [CrossRef]
Smith, T.A.; Rigby, J. Underwater radiated noise from marine vessels: A review of noise reduction methods and technology. Ocean Eng. 2022, 266, 112863. [Google Scholar] [CrossRef]
Sakai, M.; Haga, R.; Tsuchiya, T.; Akamatsu, T.; Umeda, N. Statistical analysis of measured underwater radiated noise from merchant ships using ship operational and design parameters. J. Acoust. Soc. Am. 2023, 154, 1095–1105. [Google Scholar] [CrossRef]
Miksis-Olds, J.L.; Bradley, D.L.; Maggie Niu, X. Decadal trends in Indian Ocean ambient sound. J. Acoust. Soc. Am. 2013, 134, 3464–3475. [Google Scholar] [CrossRef]
Shajahan, N.; Barclay, D.R.; Lin, Y.T. Quantifying the contribution of ship noise to the underwater sound field. J. Acoust. Soc. Am. 2020, 148, 3863–3872. [Google Scholar] [CrossRef]
Gillespie, D.; Mellinger, D.; Gordon, J.; Mclaren, D.; Redmond, P.; McHugh, R.; Trinder, P.; Deng, X.; Thode, A. PAMGUARD: Semiautomated, open source software for real-time acoustic detection and localisation of cetaceans. J. Acoust. Soc. Am. 2008, 30, 54–62. [Google Scholar]

Figure 1. Two-level decomposition with SWT.

Figure 2. Time–frequency spectrogram of a dolphin whistle signal with impulsive noise.

Figure 3. Comparison of the normalized autocorrelation function: (a) Noisy whistle signal. (b) Noise-only signal.

Figure 4. Scatter plot of Spearman’s rank correlation coefficients between whistle signals and typical ocean ambient noises.

Figure 5. Multiscale wavelet decomposition for signal content and structural composition analysis.

Figure 6. Flowchart of the proposed denoising SI-ACF method.

Figure 7. Optimized threshold interval for processing

D_{w}

detail coefficients.

Figure 7. Optimized threshold interval for processing

D_{w}

detail coefficients.

Figure 8. Narrowing the threshold interval using the trisection algorithm to identify the peak of the correlation metric.

Figure 9. Amplitude disparities between whistle signal and noise in detail coefficients

D_{3}

.

Figure 9. Amplitude disparities between whistle signal and noise in detail coefficients

D_{3}

.

Figure 10. Rearranged detail coefficients

D_{3}

in descending order.

Figure 10. Rearranged detail coefficients

D_{3}

in descending order.

Figure 11. Suppression impulsive noise in wavelet coefficients using a sliding window technique.

Figure 12. Denoised results for simulated whistle signals combined with vessel noise at different SNR levels: (a) Output SNR; (b) NRMSE.

Figure 13. Evaluation metrics for the denoised signals: (a) PCC; (b) correlation metric

ρ

-values.

Figure 13. Evaluation metrics for the denoised signals: (a) PCC; (b) correlation metric

ρ

-values.

Figure 14. Denoised results for simulated whistle signals combined with wind and wave noise at different SNR levels: (a) Output SNR; (b) NRMSE.

Figure 15. Evaluation metrics for the denoised signals: (a) PCC; (b) correlation metric

ρ

-values.

Figure 15. Evaluation metrics for the denoised signals: (a) PCC; (b) correlation metric

ρ

-values.

Figure 16. Denoised results for simulated whistle signals combined with background noise at different SNR levels: (a) Output SNR; (b) NRMSE.

Figure 17. Evaluation metrics for the denoised signals: (a) PCC; (b) Correlation metric

ρ

-values.

Figure 17. Evaluation metrics for the denoised signals: (a) PCC; (b) Correlation metric

ρ

-values.

Figure 18. Time–frequency spectrogram of dolphin whistle signal Wh_0052.

Figure 19. Denoised whistle signal Wh_0052 using different methods: (a) Sqtwolog; (b) Rigrsure; (c) Heursure; (d) Minimaxi; (e) Bayes; (f) SI-ACF.

Table 1. Comparison of denoising results under −10 dB for three types of noise.

Metric	Noise	Sqwtolog	Rigrsure	Heursure	Minimaxi	Bayes	SI-ACF
$S N R_{o u t}$	Vessel	3.35	7.80	9.33	5.11	9.66	13.03
	wind, wave	1.18	11.12	11.39	2.90	8.32	13.34
	background	−1.54	0.47	0.83	−1.11	0.50	4.20
$N R M S E$	Vessel	0.08	0.05	0.04	0.07	0.04	0.03
	wind, wave	0.11	0.03	0.03	0.09	0.05	0.03
	background	0.14	0.11	0.11	0.14	0.11	0.07
$P C C$	Vessel	0.81	0.92	0.94	0.87	0.94	0.98
	wind, wave	0.62	0.97	0.97	0.82	0.96	0.98
	background	0.07	0.65	0.66	0.21	0.54	0.85
$ρ$	Vessel	0.67	0.78	0.78	0.73	0.80	0.87
	wind, wave	0.63	0.77	0.78	0.71	0.77	0.79
	background	0.18	0.45	0.48	0.21	0.41	0.80

Table 2. PAMGuard configuration.

FFT parameters	FFT length	1024
	FFT hop	512
	Window	Hann
Whistle and moan detector	Thresholding (dB)	6.3
	Min frequency (Hz)	6000
	Max frequency (Hz)	11,000

Table 3. The results with different threshold methods, as analyzed using PAMGuard.

	PAMGuard	Sqtwolog	Rigrsure	Heursure	Minimaxi	Bayes	SI-ACF
$E D R$	38.67%	33.82%	41.31%	39.37%	38.72%	42.86%	53.45%
$M D R$	8.74%	6.80%	8.74%	8.74%	7.77%	8.74%	5.83%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhou, X.; Wu, R.; Chen, W.; Dai, M.; Zhu, P.; Xu, X. Thresholding Dolphin Whistles Based on Signal Correlation and Impulsive Noise Features Under Stationary Wavelet Transform. J. Mar. Sci. Eng. 2025, 13, 312. https://doi.org/10.3390/jmse13020312

AMA Style

Zhou X, Wu R, Chen W, Dai M, Zhu P, Xu X. Thresholding Dolphin Whistles Based on Signal Correlation and Impulsive Noise Features Under Stationary Wavelet Transform. Journal of Marine Science and Engineering. 2025; 13(2):312. https://doi.org/10.3390/jmse13020312

Chicago/Turabian Style

Zhou, Xiang, Ru Wu, Wen Chen, Meiling Dai, Peibin Zhu, and Xiaomei Xu. 2025. "Thresholding Dolphin Whistles Based on Signal Correlation and Impulsive Noise Features Under Stationary Wavelet Transform" Journal of Marine Science and Engineering 13, no. 2: 312. https://doi.org/10.3390/jmse13020312

APA Style

Zhou, X., Wu, R., Chen, W., Dai, M., Zhu, P., & Xu, X. (2025). Thresholding Dolphin Whistles Based on Signal Correlation and Impulsive Noise Features Under Stationary Wavelet Transform. Journal of Marine Science and Engineering, 13(2), 312. https://doi.org/10.3390/jmse13020312

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Thresholding Dolphin Whistles Based on Signal Correlation and Impulsive Noise Features Under Stationary Wavelet Transform

Abstract

1. Introduction

2. Background and Theory

2.1. Stationary Wavelet Transform

2.2. Ocean Ambient Noise

3. Denoising Based on Wavelet Level-Dependent Thresholding

3.1. Threshold Search Algorithm Utilizing Signal Correlation

3.2. Suppression Impulsive Noise

4. Results and Discussion

4.1. Denoising Results for Simulated Whistles with Three Typical Noise Types

4.1.1. Vessel Noise

4.1.2. Wind and Wave Noise

4.1.3. Background Noise

4.2. Denoising Results for Real Whistles

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI