1. Introduction
Marine mammals are vital for maintaining the health of marine ecosystems. As top predators, species such as dolphins significantly influence the balance of the entire marine ecosystem. Dolphins are recognized as indicators of marine ecosystem health status [
1], and their pronounced responses to environmental variations or changes make them valuable as ecosystem sentinels [
2].
Dolphins communicate through acoustic signals, which are classified into three types: whistles, clicks, and burst pulses. All these sounds serve social purposes [
3]. Whistles, narrowband frequency-modulated signals, are the basis of vocal exchanges between individuals and convey information about identity, behavioral states, and environmental conditions [
4]. Dolphins primarily use clicks for environmental perception and hunting, while burst pulses are mainly associated with social interactions and occur more frequently during such states than other behavioral states [
5].
The acoustic parameters of dolphin whistle signals (e.g., start frequency, end frequency, minimum and maximum frequency, bandwidth, duration, and the number of inflection points) vary due to factors such as activity status, group size, composition, geographic location, and ambient noise levels [
6]. Bottlenose dolphins convey individual identity-related information through unique signature whistles. Studies show significant differences in the fundamental frequency contours of signature whistles among individual dolphins, with changes observed in the same dolphin across developmental stages [
7]. Notable acoustic differences exist among dolphin populations and whistle types [
8], with geographic isolation potentially contributing to intra-species vocalization variations [
9]. Vessel activity and engine noise affect dolphin whistle parameters, especially when vessels are nearby [
10]. In underwater communication, whistle signals are crucial for bionic covert underwater acoustic communication technologies, enabling secure information transmission [
11,
12]. In summary, dolphin whistle signals have significant research value in ecology, biology, and underwater communication.
Passive acoustic monitoring (PAM) is a reliable and effective method for the long-term monitoring of marine mammals [
13]. PAM supports monitoring marine life dynamics, promotes conservation efforts, and guides sustainable management practices. However, manually monitoring and annotating dolphin whistle signals from large PAM datasets is both time-consuming and labor-intensive. Additionally, ambient noise, especially anthropogenic noise, is inevitably introduced during signal acquisition. For instance, during a container ship’s passage, the broadband received sound-pressure levels recorded by hydrophone can reach 155 dB re 1 μPa [
14], further complicating the extraction of whistle signal features.
Efficient and accurate automatic detection, extraction, and classification of whistle signals remain major challenges in the complex and dynamic underwater environment. Researchers have proposed various innovative approaches in recent years. Serra et al. used ridge detection on time–frequency spectrograms to extract whistle contours and applied a random forest classifier for whistle classification [
15]. Kipnis and Diamant used an image clustering method to optimize the connection process between whistle-contour tracking and counting [
16]. Li et al. proposed a staged generative adversarial network (GAN) framework to generate training samples for automatically extracting whistle signals from time–frequency spectrograms [
17]. While spectrogram-based approaches show promising results in processing dolphin whistles, they struggle to restore the temporal sequence characteristics of whistle signals. Mallawaarachchi et al. proposed a method that suppresses transient noise on the time–frequency spectrogram and utilizes pixel-based differentiation to eliminate non-impulse environmental noise, facilitating the extraction of the fundamental frequency of whistle signals while retaining phase information to reconstruct denoised signals [
18]. However, when non-pulse ambient noise and whistle signals have similar energy levels, subsequent processing struggles to accurately distinguish between noise and signals, hindering effective extraction. Gruden and White applied a multi-target frequency-tracking sequential Monte Carlo probability hypothesis density (SMC-PHD) filter for automatic dolphin whistle-contour extraction [
19]. They also used dual normalization in the frequency and time domains to reduce noise and enhance whistle signal features before extraction [
20]. However, insufficient noise reduction under low SNR limits its applicability in complex environments. In summary, noise masks the features of whistle signals in the time–frequency domain. Optimizing denoising techniques is crucial for improving the performance of existing methods in whistle signal detection, extraction, and classification.
Preprocessing denoising improves the visualization of dolphin whistle signals in time–frequency spectrograms, aiding the analysis of their acoustic properties. For non-stationary signals, commonly used preprocessing methods include wavelet denoising [
21], empirical mode decomposition (EMD) [
22], variational mode decomposition (VMD) [
23], principal component analysis (PCA) [
24], and adaptive filter-based dynamic denoising techniques [
25,
26]. Although these methods can effectively remove most noise based on noise levels or signal statistical characteristics, dolphin whistle signals transmitted through underwater acoustic channels may exhibit weak energy in certain parts. Denoising must ensure the preservation of all signal features to enable subsequent processing and analysis. For example, explosive events may cause significant changes in dolphin whistle signals [
27], while dolphins may employ amplitude compensation to cope with increased environmental noise to avoid signal masking [
28]. Thus, the denoising process must adapt to both time-varying noise and variations in whistle signals. Wavelet denoising enables flexible threshold setting based on signal and noise characteristics at different scales. In contrast, other time–frequency denoising methods lack controllable parameters to adjust the degree of noise removal. Wavelet thresholding is the optimal method for achieving controllable noise reduction while preserving the complete structure of the dolphin whistle signals. Nevertheless, traditional thresholding methods and recent hybrid approaches integrating wavelet techniques typically rely on the statistical distribution of noise or signal characteristics to estimate the threshold [
29,
30,
31], such as the median absolute deviation within wavelet coefficient levels and the noise intensity differences between levels. However, these methods often result in over-denoising or insufficient denoising under time-varying underwater noise conditions.
This study utilizes SWT decomposition, which enables an accurate approximation of the original signal components while requiring a few parameter selections. Meanwhile, the noise interfering with dolphin whistle signal analysis is categorized into impulsive and non-impulsive types for stepwise removal in the wavelet domain. The main contributions are summarized as follows:
- (1)
A denoising metric, , was introduced to estimate dolphin whistles under low SNR using signal correlation. A fast algorithm was developed to determine its optimal threshold, enabling the separation of fundamental frequency whistles from wavelet coefficients at different levels. The method effectively suppresses non-impulsive noise in the same frequency band while preserving whistle signal integrity and enhancing its distinct features;
- (2)
A wavelet-level sliding window approach was proposed to address impulsive noise in wavelet coefficients. This method processes impulsive noise coefficients by analyzing amplitude differences within the window and incorporating neighboring amplitude levels. Time–frequency spectrogram observations confirm its effectiveness in removing impulsive noise while maintaining whistle signal continuity;
- (3)
SWT decomposition levels were determined based on the dolphin whistle frequency range and the signal sampling rate to prevent excessive wavelet decomposition. The two proposed denoising methods (SI-ACF), relying on SWT decomposition, were validated using three types of typical underwater noise and real whistle data. The results demonstrate the SI-ACF method’s effectiveness in reducing noise interference in complex, time-varying underwater environments.
The rest of this paper is organized as follows.
Section 2 outlines the fundamental principles of SWT decomposition and explains the basis for the proposed thresholding approach.
Section 3 details the SI-ACF method denoising framework and processing steps.
Section 4 evaluates the proposed SI-ACF method under three typical underwater noise scenarios (vessel noise, wind and wave noise, and background noise). It compares the denoising performance with other wavelet thresholding techniques and analyzes its application to a real whistle signal dataset. Finally,
Section 5 presents the key conclusions and provides insights for future research directions.
3. Denoising Based on Wavelet Level-Dependent Thresholding
In complex underwater environments, changes in the relative position between the target sound source and the hydrophone result in Doppler frequency shifts. Additionally, multipath delays and the time-varying nature of the underwater acoustic channel create temporal variations in ocean ambient noise. These factors collectively affect the frequency and time distribution of the acquired signals, potentially dispersing the energy of the desired signal and increasing the complexity of signal processing. Wavelet decomposition uses multiscale analysis to effectively reveal the signal’s content and structure, as shown in
Figure 5. After extracting the signal content (detail coefficients at level 3) from the original signal, a significant magnitude difference is observed between the effective signal and noise in the wavelet sub-band
. This difference forms a solid foundation for wavelet threshold denoising.
In addition to thresholding, wavelet processing requires selecting appropriate decomposition levels and wavelet bases. Most dolphin whistles have a fundamental frequency range of 2 to 30 kHz [
41]. Given the limited frequency range of dolphin whistles, selecting too many decomposition levels for wavelet decomposition is unnecessary. Increasing decomposition levels only refines the frequency band of the approximation coefficients in the lower range, without affecting the whistle signal’s high-frequency components. Therefore, the excessive subdivision and thresholding of wavelet coefficients that lack whistle components is unnecessary and only increases computational complexity. Considering the fundamental frequency range of dolphin whistles and the coarse frequency subdivisions of wavelet sub-bands in [
42], we propose Equation (
6) to determine the optimal SWT decomposition level for dolphin whistle signals.
The wavelet decomposition level is represented as
l, and the minimum fundamental frequency of the whistle signal is
. As shown in
Figure 1, the choice of wavelet basis affects frequency-band aliasing, with some overlap observed between wavelet coefficients at different levels. Processing with the Daubechies wavelet reduces frequency-band overlap, and this reduction improves as the number of vanishing moments increases. Compared to other wavelet bases in the wavelet family, Daubechies wavelets offer a greater range of vanishing moments, allowing for more flexible selection based on signal-processing requirements. This property effectively minimizes mutual interference between wavelet coefficients at different levels during multi-level denoising. The vanishing moment is a key parameter that determines the accuracy of wavelet functions in approximating or estimating polynomials. A higher vanishing moment enhances wavelet compactness, effectively filtering redundant information but can cause overfitting if excessively high. A moderate vanishing moment is suitable for achieving a balance between accuracy and generalization in processing whistle signals.
This section details the proposed SI-ACF denoising method, designed for whistle signals’ fundamental frequency under low SNR conditions. SWT decomposition offers a multiscale time–frequency framework for whistle signal denoising. Building on this, the SI-ACF method employs two strategies to reduce noise: (1) a multi-level sliding window method detects impulsive discontinuities and suppresses impulsive noise in wavelet detail coefficients; (2) an efficient threshold search leverages whistle signal correlation to reduce non-impulsive.
3.1. Threshold Search Algorithm Utilizing Signal Correlation
Distinguishing whistle signals from non-impulsive noise is crucial for threshold selection, as it directly influences the balance preserving whistle signal integrity and removing noise. Wavelet decomposition divides whistle signals into components spanning multiple frequency bands. Wavelet decomposition not only separates whistle signals from noise but also provides a multiscale analysis of the differences between signals and noise, aiding in threshold selection. The complexity of underwater acoustic channels causes significant variations in noise across frequency bands and time intervals, necessitating appropriate thresholds for wavelet denoising at each level. Dynamic threshold setting is essential, considering the magnitude of wavelet coefficients at each level and the signal’s characteristics.
Building on the correlation analysis between whistle signals and typical ocean noise in
Section 2.2, we propose a wavelet threshold denoising metric
that does not require prior information. The metric is based solely on the correlation characteristics of whistle signals. The metric, defined through the autocorrelation function in Equation (
5), is expressed in Equation (
7):
m represents the position where
first reaches zero, with
ranging from (0, 1). The metric evaluates the overall correlation of the signal, reflecting the strength of its periodic components. It also characterizes the correlation features of the whistle signal’s fundamental frequency. In
Figure 3, red dots represent the
metric.
Wavelet denoising removes high-frequency noise by processing detail coefficients while preserving approximation coefficients to maintain the signal’s main structure. However, due to the limited frequency range of the whistle signals, the approximation coefficients at the final level of the wavelet decomposition may not contain components of the whistle signals. Low-frequency wavelet coefficients primarily represent the signal’s overall trend and main structure. In wavelet level-dependent processing, unprocessed wavelet coefficients can interfere with the results of current-level processing, regardless of whether the sequence moves from high to low or low to high frequencies. Such interference may impact the stability of correlation-based denoising methods. To ensure reliability, the overall process of the proposed SI-ACF denoising method is shown in
Figure 6.
The SI-ACF method for SWT-based multi-level denoising of whistle signals involves three steps:
- (1)
First, Equation (
6) is used to determine the wavelet decomposition level
l, preventing over-decomposition of wavelet sub-bands without whistle coefficients and avoiding excessive processing;
- (2)
Next, impulsive noise is suppressed by processing detail coefficients at each level using a sliding window, where detection is based on the segmented standard deviation of wavelet coefficients in descending order within the window; this approach reduces removal bias and selects the threshold based on neighboring amplitude levels, ensuring signal continuity without affecting other coefficients;
- (3)
Finally, wavelet coefficient thresholds for processing non-impulsive noise at each level are precisely selected based on the correlation metric
from Equation (
7), while an optimized processing sequence minimizes coefficient interference and enhances the stability of metric
.
The detailed steps of the wavelet level-dependent threshold denoising method for processing non-impulsive noise using the correlation metric are outlined below.
To minimize interference between wavelet coefficients across levels, the correlation metric
is used to determine whether the approximation coefficients from the last decomposition level, as defined in Equation (
6), contain whistle signal components. Performing
l-level SWT decomposition yields a set of wavelet coefficients,
. Next, the correlation results
are calculated for signals reconstructed from individual wavelet coefficients. Similarly,
represents the correlation results after excluding individual coefficients from the complete set. Based on this correlation analysis, approximation coefficients are processed using Equation (
8) to improve the stability of the denoising process. The variable
w identifies the detail coefficient level that contains the primary components of the whistle signal. Within
, the detail coefficient corresponding to the maximum correlation value is defined as
, representing the whistle signal’s most correlated component.
If component is fully retained in the previous step, it should be prioritized during the second phase of level-dependent threshold denoising based on the correlation method. For wavelet coefficients’ processing at each level, a specific threshold exists that maximizes the whistle signal correlation metric . As noise is progressively removed, increases; however, when the effective signal is compromised, starts to decrease. For the threshold determination method based on signal correlation, take the detail coefficients (where ) from the wavelet decomposition as an example. The range is selected as the optimal threshold search interval. The trisection method is used to identify the threshold corresponding to the maximum p-value within this range, which is then applied to process the detail coefficients at the current level.
The final step focuses on processing
. Since
is crucial for the whistle signal’s overall correlation, removing noise coefficients in the same frequency band can significantly influence the correlation. As noise gradually decreases, if part of the whistle signal is disrupted, the remaining segments may exhibit a higher correlation. Thus, redefining the threshold range is essential for optimizing denoising performance. According to [
43], dolphin whistle signals typically last longer than 0.1 seconds. During the processing of detail coefficients
, the signal is segmented with a window length of
. Absolute values within each window are averaged to identify the minimum average value (
) and the overall mean value (
) of all windows. The trisection method is then applied within the threshold range to find the threshold that maximizes the correlation metric
, where
and
, as shown in
Figure 7. Finally, the optimal threshold is used to process
, and the processed wavelet coefficients are reconstructed to yield the denoised signal.
As illustrated in
Figure 8 for processing the detail coefficient
, the specific steps for searching the optimal threshold using the trisection method are as follows:
Define the threshold search range as [,], where and . Calculate the correlation result when the threshold is set to , which corresponds to the scenario where the current level of detail coefficients is unprocessed. Similarly, calculate the correlation result when the threshold is set to , representing the scenario where the current level of detail coefficients is entirely removed;
The search interval [,] is iteratively narrowed using the trisection algorithm. For the k-th iteration, the trisection algorithm selects and , and calculates the corresponding correlation results and ;
Compare the values of , , and . If is the maximum, update the threshold search interval to [,]; if is the maximum, update the interval to [,]; if or is the maximum, update the interval to [,] or [,], respectively;
After each update of the threshold interval,
k is incremented by 1, and the next iteration is performed until the termination condition in Equation (
9) is satisfied:
At this point, the threshold corresponding to the maximum value of the correlation metric
within the threshold interval (as indicated by the peak point in
Figure 8) is determined to be the optimal threshold for processing the detail coefficient
.
3.2. Suppression Impulsive Noise
Dolphin whistle signals recorded by hydrophones often contain random impulsive noise, as shown in
Figure 2 and
Figure 5. This noise may originate from the environment or dolphin click signals. Its presence interferes with extracting the characteristic parameters of whistle signals.
Figure 5 shows that the amplitude distribution of wavelet coefficients reveals impulsive noise does not maintain high amplitudes across the entire frequency band. Using the signal in
Figure 2 as an example, the optimal-level wavelet decomposition based on Equation (
6) was applied, and all detail coefficients were rearranged in descending order. The reordered curves of detail coefficients across all levels show a consistent trend.
Figure 9 shows the amplitude distribution of detail coefficients
, while
Figure 10 depicts the curve after rearrangement in descending order. Abrupt high-amplitude variations at the curve’s ends are concentrated in very short intervals, corresponding to wavelet coefficients of impulsive noise. Noise regions marked in red in
Figure 9 appear as flat, low-amplitude areas in
Figure 10, representing the main components of non-impulsive noise. In summary, the primary wavelet coefficients of whistle signals differ from noise coefficients in amplitude. Combined with the amplitude distribution of wavelet coefficients in
Figure 5, this feature offers a clear basis for distinguishing and removing both impulsive and non-impulsive noise.
Impulsive noise shows differences between low and high amplitude values, as illustrated in
Figure 9. Selecting the threshold based on the overall descending order of detail coefficients at each level may fail to remove low-amplitude impulsive noise effectively. To detect impulsive noise coefficients and set appropriate thresholds, this subsection introduces a sliding window method for the threshold processing of detail coefficients at each level, as shown in
Figure 11. The sliding window method mitigates removal biases caused by amplitude differences between low- and high-amplitude impulsive noise coefficients.
The proposed impulsive noise suppression method focuses on identifying significant magnitude change points in the descending-order curve, represented as the positive and negative magnitude parts in
Figure 10, denoted by
and
. Change points within each sliding window are detected using the MATLAB R2023b
function [
44,
45]. This function detects change points
within each window by evaluating segment differences and maximizing standardized variance differences between segments. For a window of length
, the method divides the positive magnitude part of the descending-order curve into two intervals:
and
, using the positive magnitude part as an example. The change point
and the corresponding threshold
are calculated based on the standard deviation differences between these intervals. The same procedure is applied to the negative magnitude part to calculate the threshold
.
and
denote the thresholds for the positive and negative sections of window ℓ at level
j detail coefficients, respectively. Finally, impulsive noise coefficients within the window are eliminated based on Equation (
10):
represents the
ℓ window of the level
j detail coefficients,
L represents the window length, set to 10% of the signal sampling rate.
S denotes the step size, equal to half of the window length.
K represents the total number of windows.
and
represent the mean values of all positive and negative numbers within the window, respectively, as shown in Equations (12) and (13).
Unlike median filtering, the proposed method prevents energy dispersion in whistle signals caused by signal averaging. Compared to clipping and wave-shaping, the sliding window with an adaptive threshold modulation strategy selectively removes impulsive noise coefficients. This method improves flexibility and accuracy while maintaining the integrity and continuity of the whistle signal.
4. Results and Discussion
To comprehensively evaluate the proposed SI-ACF method’s denoising performance, validation experiments were conducted using both simulated and real whistle signals. Simulated whistle signals were constructed using chirp-like signals, allowing the generation of different whistle types, with Gaussian envelopes applied to further mimic the frequency and amplitude variations characteristic of real dolphin whistles. Simulated whistle signals provide a clean reference signal, making them suitable for quantifying denoising effectiveness under various noise levels and types. In contrast, real whistle signals lack a noise-free reference, making it difficult to compare numerical results.
For the simulated whistle signal
, three types of noise
(vessel noise, wind and wave noise, and background noise) measured in Xiamen Bay were used. These three typical noise types were selected based on the different frequency ranges and sound-pressure spectrum levels of ocean noise summarized in the Wenz curve [
46], as well as commonly encountered and representative noise in real-world scenarios. Based on Equation (
14), the simulated whistle signal was combined with these noise types at varying SNR levels to generate noisy signals. The synthesized signals were processed using the thresholding rule and denoising strategy of the proposed SI-ACF method, along with the Sqtwolog, Rigrsure, Heursure, Minimaxi, and Bayes thresholds [
47], to obtain the denoised signals
. The output SNR was calculated using Equation (
15) to quantify the quality of the denoised signal. Furthermore, the normalized root mean square error (NRMSE) from Equation (
16) was used to measure the difference between the denoised and reference signals, while the Pearson correlation coefficient (PCC) from Equation (
17) assessed their degree of linear association.
represents the covariance between the reference signal and the denoised result. and represent the standard deviations of these signals, respectively. Additionally, the metric is calculated to evaluate the denoised results. Combined with other evaluation metrics, offers additional validation for the effectiveness of the proposed method.
4.1. Denoising Results for Simulated Whistles with Three Typical Noise Types
The proposed SI-ACF method is designed to preprocess and denoise dolphin whistle signals in underwater scenarios with low SNR conditions. To ensure the reliability and generalizability of our findings, the denoising performance was evaluated across multiple sets of simulated different types of whistle signals combined with different noise conditions. Synthetic whistle signals with input SNR values from −20 to 0 dB were used as experimental subjects in the simulation experiments. The simulated whistle signals adopted in subsequent analyses were configured with a duration of 0.8 s, a sampling rate of 96 kHz, and a frequency range of 6.5–9 kHz. To ensure fair result comparisons, all thresholding denoising methods were applied under consistent SWT decomposition conditions. Specifically, the db20 wavelet basis, decomposition levels determined by Equation (
6), and soft thresholding were applied. Additionally, all wavelet coefficients, including the final level’s approximation coefficients, were included in the processing.
4.1.1. Vessel Noise
Ship noise is a common anthropogenic noise source in underwater environments, and its underwater radiated noise typically includes mechanical noise, propeller and flow noise, and noise generated by cavitation effects [
48]. Among these, propeller cavitation is the primary source of ship underwater radiated noise [
49], and as observed from the Wenz curve, the spectral range of this broadband noise overlaps with the communication frequency range of dolphin whistles.
Figure 12a shows that the SI-ACF method significantly outperforms other thresholding methods under low SNR conditions. For input SNRs between −18 dB and 0 dB, it achieves over 5 dB improvement compared to Sqtwolog and Minimaxi, and about 3 dB better performance than Bayes and Heursure. At −20 dB, it performs slightly worse than Bayes and Heursure.
Figure 12b supports the same conclusion. The Sqtwolog and Minimaxi methods, which rely on noise estimation for threshold selection, struggle in low-SNR underwater environments due to significant time-varying noise variations. This limitation primarily stems from the noise statistics estimation based on global wavelet coefficients that average out the temporal differences in noise characteristics. Such averaging leads to an inaccurate noise-level estimation, which in turn biases the threshold selection and degrades denoising performance compared to the SI-ACF method, which adaptively selects thresholds by leveraging the whistle signal’s correlation characteristics across the entire time segment to dynamically distinguish it from noise.
Figure 13a,b show the results of two correlation metrics, demonstrating that the SI-ACF method consistently achieves higher correlation than other thresholding methods. This differs from the results in
Figure 12 under −20 dB conditions, likely due to the enhancement of ship spectral line noise, a type of narrowband noise dominated by low-frequency fundamental tones and their harmonics, typically generated by periodic mechanical vibrations such as engine rotation or propeller cavitation. This narrowband noise appears in the frequency domain as regular peaks (such as fundamental and harmonic components), which introduces bias in the threshold selection for the correlation metric
. Low-frequency noise with a trend can influence the threshold selection of the correlation metric
, leading to a reduction in denoising performance. Nevertheless,
Figure 13 shows that the SI-ACF method surpasses other thresholding techniques in preserving whistle signal integrity, as evidenced by both the PCC and
metric.
4.1.2. Wind and Wave Noise
In the low-frequency range of typical underwater spectra, ship traffic noise dominates, while above this frequency range, wind-driven wave sounds become predominant [
50]. The size of wind waves is determined by three factors: wind strength, wind duration, and fetch (i.e., the distance over which the wind blows across uninterrupted water surface in a constant direction).
Figure 14 shows that the SI-ACF method consistently outperforms other methods, while Sqtwolog and Minimaxi exhibit the poorest performance. The Bayes thresholding method struggles to select the optimal threshold based on the prior distribution of wind and wave noise coefficients, leading to poorer denoising performance in this noise scenario compared to vessel noise scenarios. The primary frequency bands of wind and wave noise substantially overlap with the whistle signals’ fundamental frequency, complicating the threshold selection for the Bayes method. Conversely, vessel noise is predominantly confined to low-frequency ranges, facilitating more accurate threshold estimation for the Bayes approach under these conditions. In terms of output SNR and NRMSE, the SI-ACF method demonstrates stable denoising performance under wave noise conditions. Wind and wave noise exhibit non-Gaussian, non-stationary characteristics, with a time-varying spectral energy distribution driven by environmental dynamics. These noise properties result in the more severe degradation of whistle signals within overlapping frequency bands; however, the SI-ACF method reliably overcomes these challenges.
Figure 15a shows that the SI-ACF method achieves a PCC consistently above 0.8, indicating a high level of linear correlation with the reference whistle signal in the denoised results. This high correlation is further corroborated by the proposed correlation metric p, as shown in
Figure 15b, which attains values above 0.69 across all tested SNR levels. In contrast, the Sqtwolog and Minimaxi methods both have PCC values below 0.3 and metric
-values below 0.35 within the −20 dB to −16 dB SNR range. Such low correlations indicate that these conventional methods over-suppress valid signal components during denoising, severely compromising the integrity of the whistle structure. The SI-ACF method employs correlation to differentiate between noise-dominant and signal-dominant regions, thereby overcoming the excessive denoising.
4.1.3. Background Noise
Background noise is composed of a mixture of various noise sources in the marine environment, reflecting the complex and dynamic acoustic conditions underwater. These sources include a combination of natural and anthropogenic components.
Figure 16 shows that the SI-ACF method significantly outperforms other thresholding methods in both output SNR and NRMSE. Unlike the other two noise types, background noise not only comprises low-frequency components and frequency bands that overlap with whistle signals but also includes impulsive noise. The complexity of these noise components results in smaller improvements in output SNR and NRMSE compared to those observed with the other two types of noise. Nevertheless, the SI-ACF thresholding method effectively balances the removal of both impulsive and non-impulsive noise characteristics. Additionally, differences among other thresholding methods are minor, with only Sqtwolog and Minimaxi performing the worst within the −12 dB to 0 dB range.
Background noise more accurately represents the environment where the whistle signal is captured.
Figure 17 shows that the PCC and the proposed correlation metric
demonstrate the SI-ACF method’s superior ability to preserve the whistle signal’s main structure during denoising.
Figure 17a shows that other thresholding methods cannot effectively separate noise from the whistle signal, leading to a low linear correlation with the reference signal at SNRs below −16 dB.
Figure 17b presents the correlation assessment of the denoised results using the proposed metric
, revealing that other methods significantly degrade the whistle signal’s main structure, resulting in lower
-values.
Table 1 summarizes the denoising performance at −10 dB input SNR across three noise types. The numerical results clearly show that the denoising performance of each method deteriorates under background noise conditions, as the shallow sea background noise in the inactive neighborhood of the whistle signal is more complex, with more dramatic variations in the same frequency band. However, SI-ACF still maintains the best denoising capability, effectively distinguishing whistle content that has been obscured by noise, and leveraging the high energy of impulse noise and the inherent correlation characteristics of the whistle signal. In contrast, noise estimation or wavelet coefficient characteristic-based thresholding methods fail to accurately determine the threshold under conditions where both noise and whistle signals vary in the time–frequency domain.
4.2. Denoising Results for Real Whistles
To denoise real dolphin whistle signals, we utilized a dataset from Fremantle Inner Harbour [
6] and analyzed the processed signals using PAMGuard software (version 2.02.14) [
51]. This whistle detector identifies whistles by searching for the energy ridges in the sound. However, the energy level of the whistle signals is weak in this dataset, which increases the difficulty of detecting them. Consequently, 40 audio files containing 103 whistle signals, representing various whistle structures, were manually selected from the dataset using Raven Pro 1.6 software (Cornell Laboratory of Ornithology, Ithaca, NY, USA) for the experiment. The PAMGuard configuration is detailed in
Table 2.
Due to the complex ambient noise in this dataset, other thresholding methods based on level-dependent processing are prone to erroneously deleting valid signals. Therefore, the best results from level-independent or level-dependent methods among these methods are selected for comparison.
Figure 18 shows the whistle signal Wh_0052, which demonstrates that the whistle signal is severely interfered with by impulsive noise and contains ship spectral line noise.
Figure 19 compares the time–frequency spectrograms of denoised results using various thresholding methods, only the proposed SI-ACF method accurately identifies the whistle signal’s frequency band and removes irrelevant wavelet coefficients, while other methods retained some irrelevant coefficients. Furthermore, during denoising within the same frequency band, the SI-ACF method avoids over-denoising, preserving more whistle energy and enhancing the visualization of the whistle signal in the time–frequency spectrogram. Noise estimation-based methods prioritize noise reduction over preserving valid signals, leading to imbalanced threshold determination. Among other thresholding methods, except for Bayes and Minimaxi, whistle signal components are visibly weakened. Bayes and Minimaxi retain more noise because level-dependent methods tend to over-denoise, while level-independent methods are insufficient for suppressing most noise. While impulsive noise coefficients minimally affect numerical metrics such as output SNR and NRMSE, the time–frequency spectrogram distinctly demonstrates the SI-ACF method’s superior ability to remove these coefficients. Furthermore, the second whistle in
Figure 18 exhibits a breakpoint at 2.5 s. It is evident that the SI-ACF method does not enlarge this gap during denoising, whereas other excessive denoising methods exacerbate it, potentially causing the whistle signal to be identified as two separate signals.
When detecting whistle contours with PAMGuard, a single whistle signal may be split into multiple signals, noise might be misclassified as a whistle, and some whistles might be missed altogether. Using the 103 selected whistles as the reference set (denoted as
), we recorded the number of correctly identified whistles (
), incorrectly identified whistles (
), and missed whistles (
). Since the frequency range of whistle detection was predefined and the whistle count was small, all methods showed minimal false detection. Moreover, PAMGuard’s configuration allows modifications to lower the false detection rate. The denoising performance was evaluated using Equation (
18) for the effective detection rate (EDR) and Equation (
19) for the missed detection rate (MDR).
The detection results without preprocessing of wavelet denoising, as shown by PAMGuard, are presented. As shown in
Table 3, after denoising, the SI-ACF method achieved the highest effective detection rate and the lowest missed detection rate, and by effectively reducing noise, lowered the missed detection rate, likely by attenuating the surrounding noise of weak whistle signals, thus enhancing the energy components of the whistle signals. In contrast, the effective detection rate of the Sqtwolog method was lower than the results without wavelet preprocessing, while the Minimaxi method only improved the effective detection rate by 0.05% over the no-preprocessing case, further confirming that noise estimation threshold methods struggle to adapt to underwater noise environments. The SI-ACF method improved the EDR by nearly 14.78%, while methods such as Rigrsure, Heursure, and Bayes showed minimal improvement. This situation can be attributed to two main reasons: first, under low SNR, the weak energy at the starting, ending, and some turning points of the whistle signal resulted in some components of the whistle not being effectively detected, leading to single whistle signals being counted as multiple signals. Secondly, the presence of impulsive noise disrupted the continuity of the whistle signal in the time–frequency domain, also causing single whistle signals to be split. Therefore, removing impulsive noise is crucial for improving the accuracy of whistle detection. In summary, the SI-ACF wavelet threshold denoising method effectively removes irrelevant noise by applying precise thresholds, while preserving the integrity of the whistle signals and enhancing the effectiveness of automatic whistle detection.
5. Conclusions
This study proposes a wavelet thresholding denoising method based on signal and noise characteristic estimation (SI-ACF) for dolphin whistle fundamental frequency signals under low SNR underwater environments. The method is designed to replace traditional thresholding approaches that rely on noise-level estimation or wavelet coefficient features, enabling precise threshold determination while preserving the integrity of whistle signals. By utilizing the frequency range of dolphin whistles to determine and limit the decomposition levels in SWT, over-processing is effectively prevented, and the proposed method suppresses impulsive noise by incorporating amplitude levels from adjacent time segments, thereby preventing over-denoising. Furthermore, it leverages the correlation characteristics of whistle signals to achieve accurate signal estimation, effectively retaining signal components during noise suppression.
The SI-ACF method was validated using simulated whistles combined with three types of typical underwater noise and real whistle data. The results show that SI-ACF overcomes the limitations of noise-estimation-based wavelet denoising methods in time-varying underwater noise environments. However, the autocorrelation-based metric is currently limited to processing the fundamental frequency of whistle signals. Under low-SNR conditions, accurately identifying harmonic components of whistle signals remains challenging, even for human operators. Furthermore, when noise contains coherent components, the SI-ACF method may yield insufficient denoising performance. The proposed method, however, shows strong potential for broader applications to other periodic or quasi-periodic signals. Future work will focus on adaptively optimizing the thresholding function to improve same-frequency denoising accuracy for nonlinear, non-stationary whistle signals in dynamic noise environments. This work not only improves the accuracy of wavelet-based denoising but also contributes to understanding how environmental variations influence bioacoustic behaviors and shape species-specific vocalization patterns, providing valuable insights into adaptive mechanisms and supporting conservation efforts.