Next Article in Journal
An Improved Real-Time LOS-Based Model Predictive Control for the Semi-Submersible Offshore Platform Under Ocean Disturbances
Previous Article in Journal
Dynamic Analysis of Subsea Sediment Engineering Properties Based on Long-Term In Situ Observations in the Offshore Area of Qingdao
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Parallel Net: Frequency-Decoupled Neural Network for DOA Estimation in Underwater Acoustic Detection

Advanced Interdisciplinary Technology Research, National Innovation Institute of Defense Technology Center, Beijing 100071, China
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
J. Mar. Sci. Eng. 2025, 13(4), 724; https://doi.org/10.3390/jmse13040724
Submission received: 21 February 2025 / Revised: 25 March 2025 / Accepted: 2 April 2025 / Published: 4 April 2025
(This article belongs to the Topic Advances in Underwater Acoustics and Aeroacoustics)

Abstract

:
Under wideband interference conditions, traditional neural networks often suffer from low accuracy in single-frequency direction-of-arrival (DOA) estimation and face challenges in detecting single-frequency sound sources. To address this limitation, we propose a novel model called Parallel Net. The architecture adopts a frequency-parallel design: it first employs a recurrent neural network, the generalized feedback gated recurrent unit (GFGRU), to independently extract features from each frequency component, and then it fuses these features through an attention mechanism. This design significantly enhances the network’s capability in estimating the DOA of single-frequency signals. The simulation results demonstrate that when the signal-to-noise ratio (SNR) exceeds −10 dB, Parallel Net achieves a mean absolute error (MAE) below 2°, outperforming traditional frequency-coherent neural networks and the MUSIC algorithm, and reduces the error to half that of classical beamforming (CBF). Further validation on the SWellEx-96 experiment confirms the model’s effectiveness in detecting single-frequency sources under wideband interference. Parallel Net exhibits superior sidelobe suppression and fewer spurious peaks compared to CBF, achieves higher accuracy than MUSIC, and produces smoother and more continuous DOA trajectories than conventional neural network models.

1. Introduction

Direction-of-arrival (DOA) estimation is a fundamental technique in underwater acoustic detection, with its origins tracing back to Bartlett’s beamforming method proposed in the 1950s [1]. Traditional DOA estimation methods, such as classical beamforming, MUSIC algorithms, and maximum likelihood estimation, have been widely applied in both theoretical studies and practical scenarios. However, their performance in complex underwater environments is significantly limited by noise interference and multipath effects. In recent years, the rapid development of deep learning technology has provided new solutions to the problem of sound source localization. Numerous studies have explored the use of neural networks to improve localization accuracy and environmental adaptability. For example, Niu et al. [2] utilized feed-forward neural networks (FNNs) for sound source localization, while Huang et al. [3] combined TDNN [4] and CNN-FNN [5] structures to process time-domain and frequency-domain signals, demonstrating their effectiveness in shallow water environments. For more complex sound source signals in deep sea environments, convolutional neural networks such as Inception [6] and ResNet50 [7] have been widely employed to process array signal covariance matrices, achieving further improvements in localization accuracy [8,9,10]. Moreover, recurrent neural networks (RNNs) [11] have been applied to Bayesian processes [12] or enhanced with attention mechanisms to improve prediction accuracy [13,14,15]. In the context of DOA prediction, Niu and Liu explored the feasibility of using FNNs and CNNs for DOA estimation [16,17], but their studies were limited to single-frequency signals. Similarly, Xie [15] investigated the application of RNNs to signals received by a single-vector hydrophone, and Li examined the feasibility of using CNN-RNN combinations for array-based multi-frequency DOA prediction [18], though neither addressed multi-source scenarios comprehensively.
Currently, models leveraging broadband signals often treat neural networks [8,9,10,11,12] as a ‘black box’, typically operating within the framework of frequency-coherent methods [19], with architectures evolving toward increasing complexity [20]. While such networks exploit frequency interrelations to some extent, their high computational complexity and sensitivity to environmental changes limit their practicality. Additionally, preprocessing methods introduce extra uncertainty during network input: (1) independent normalization may amplify noise at each specific frequency without signals; and (2) global normalization might obscure weak signals, leading to the loss of critical information. These limitations are particularly pronounced under complex interference conditions. Recent studies [21] have applied neural networks to enhance beamforming outcomes. Although beamforming methods are non-coherent, this work mainly investigates fusion-enhanced networks in the later stages of processing, while non-coherent alternatives receive limited experimental attention. Furthermore, existing studies lack targeted solutions when key frequency features are obscured by interference.
To address these challenges, this study proposes a Parallel Net model inspired by the group convolution structure [22,23]. Unlike traditional frequency-coherent methods [8,9,10,11,12], this model incorporates a frequency-incoherent [24] approach by introducing a grouping mechanism in RNNs. This allows the model to independently process multi-frequency information, thereby decoupling frequency interrelations and mitigating interference between frequency components. During the frequency information fusion phase, the model employs an attention mechanism [25] to suppress irrelevant information and emphasize key features, enhancing its adaptability to complex environments and improving localization accuracy for single-frequency signals. The experimental results show that the proposed method enhances DOA estimation performance under broadband interference, offering a robust and efficient approach to address the limitations of conventional methods in complex acoustic scenarios.

2. Architecture of Parallel Net

2.1. Model Input

As shown in Figure 1, suppose N sound sources impinge on an array with M elements. The signal received by the m -th array element can be expressed as follows:
x m t = n = 1 N s n ( t ) a m , n + n m ( t ) , m = 1,2 , , M ,
where s n ( t ) represents the signal of the n -th sound source; a m , n is the array manifold vector between the n -th sound source and the m-th receiving array element; and n m ( t ) denotes the additive Gaussian white noise at the m-th element, which is assumed to be independent of the signal.
When N sound sources, s 1 ( t ) , s 2 ( t ) , , s N ( t ) , impinge on the array at frequencies f 1 , f 1 , , f N , the received signal at frequency f can be expressed as follows:
X ( f ) = A S + N = [ a ( θ 1 ) , a ( θ 2 ) , ,   a ( θ N ) ] M × N × [ s 1 ( f ) , s 2 ( f ) , , s N ( f ) ] 1 × N T + N = 1 1 1 e 2 π d sin θ 1 λ e 2 π d sin θ 2 λ e 2 π d sin θ N λ e 2 π d ( M 1 ) sin θ 1 λ e 2 π d ( M 1 ) sin θ 2 λ e 2 π d ( M 1 ) sin θ N λ s 1 ( f ) s 2 ( f ) s N ( f ) + N 1 ( f ) N 2 ( f ) N M ( f )
where A ( f ) is the array manifold vector, S ( f ) is the signal vector, and N f is the noise. A ( f ) is a Vandermonde matrix, and as long as M N (i.e., the number of sound sources is less than the array dimensions), A f is non-singular.
The angular information of sound sources is embedded in the array-received signal, enabling the deep neural network to model complex information for extracting the incident angles of sound sources.
Typically, deep neural networks use single-frequency localization methods where the network input is the covariance matrix of the signal at frequency f . Compared to time-domain signals or X f , the covariance matrix has a smaller data volume and can reduce the impact of the phase of the source excitation. This allows for better performance with less training data. The covariance matrix R f can be expressed as follows:
R f = E [ X f X H f ] ,
For multi-frequency localization, the network input consists of the covariance matrices of multiple frequency points:
{ R f 1 , R f 2 , , R f F } ,
where F denotes the number of frequency points. If the number of array elements is M and the signal contains F frequency points, the input data dimension is M2 × F × 2.
The covariance matrix at each frequency point is normalized by the maximum magnitude, as expressed in Equation (8):
R n o r m f = R f max R f

2.2. Details of the Parallel Net Model

Inspired by group convolution, this paper proposes the Parallel Net model based on the generalized feedback gated recurrent unit (GFGRU) [11] layer, which groups frequency points. Compared with a GRU [26], the GFGRU allows each hidden unit to learn the states of all hidden layers from the previous time step through a global reset gate, adaptively retaining information from all hidden layers of the previous time step. This results in faster convergence and higher accuracy during training. The gating mechanism of the GFGRU is shown in Equation (A1) of Appendix A.1, and the structure of the GFGRU is shown in Figure A1 of Appendix A.2.
As shown in Figure 2, Parallel Net consists of 10 grouped GFGRU layers and squeeze-and-excitation (SE) attention layers [23]. Compared to the multi-head attention in transformers, the SE layer is a computationally lightweight attention implementation method. It can selectively emphasize informative channels by assigning higher weights through threshold-based filtering. In the feature extraction phase, the sparse Bayesian learning network constructed by the GFGRU performs independent operations for each frequency to extract the location information of the sound source. The SE layer then weights each channel based on the activation features of each frequency point, focusing more on effective frequency points. Finally, the average value of the outputs from the Softmax [27] layer of each frequency point is used as the localization result. The detailed parameters are shown in Figure 3.
To compare and analyze, two versions of the Parallel Net model are proposed: V1 and V2, while V3 serves as a baseline model based on the frequency-coherent approach [12]. The structural parameters of these three neural network models are shown in Figure 3. The complexity evaluation of different methods is shown in Table A1.
  • V1: Consists of a sparse Bayesian learning network formed by 10 groups of GFGRU layers. Each group contains 256 hidden units, 10 time steps, and 2 stacked layers. The output is a vector of size 360, representing the direction of arrival of the sound source. Thus, the output vector dimension of the GFGRU layer is 10 × 360 = 3600.
  • V2: Built upon V1 by incorporating an SE attention layer. The SE layer applies channel-wise weighting with minimal computation, emphasizing effective frequency points for better localization.
  • V3: Serves as a frequency-coherent baseline model constructed using non-grouped GFGRU layers. Each group contains 256 hidden units, 10 time steps, and 2 stacked layers, with the output dimension being 360.

2.3. Loss

In this study, DOA prediction is formulated as a classification problem where the output represents the directional probabilities of sound sources. For classification tasks, the Softmax [27] function is used to map the raw outputs of the network into a probability distribution, while the cross-entropy [28] loss function minimizes the discrepancy between the predicted and true distributions.
For a sample containing N frequency points, the Softmax function for the i-th node output is defined as follows:
P j ¯ = 1 N i = 1 N exp z i , j j = 1 C exp z i , j
where z i , j is the raw output value of the j-th node, C is the total number of output nodes (i.e., the number of DOA classes, C = 360), and P i represents the predicted probability for class i.
And the cross-entropy loss is expressed as follows:
E = j = 1 C y j l n P j ¯
where y j represents the ground truth label of sample.

2.4. Dataset and Training Details

2.4.1. Training Set

The training set was generated through simulations in MATLAB-R2022b, following the experimental array configuration of the SWellEx-96 experiment [29]. The incident angles ranged from 1° to 360° in 1° increments, corresponding to noise-free sources. The sampling rate was 3276.8 Hz, with a frequency resolution of 0.8 Hz and a frequency band of 72–79 Hz, including 10 frequency points. By combining the incident angles of two sound sources, a total of C 360 2 = 64,980 training samples were generated. During training, a random frequency point was selected for each sample, and white noise with an amplitude of 0.1–1.4 times the signal strength was added.
Algorithm 1 outlines the process of generating simulated signals in this paper:
Algorithm 1. The process of generating simulated signals
  • Generate the received signal of the array according to the direction-of-arrival (DOA) angle θ: X 27 × 10 θ
  • Add the received signals of two sound sources: X 27 × 10 = X 27 × 10 θ 1 + X 27 × 10 θ 2
  • Inverse Fourier transform to the time-domain sequence: F 1 { X 27 × 10 } = S 27 × 4096
  • Add noise: S = S + N
  • Fourier transform to the frequency-domain signal: F S 27 × 4096 = X 27 × 10
  • Calculate the covariance matrix at frequency point f: R f = E X f X H f , forming: { R f 1 , R f 2 , , R f 10 }
  • Take the real and imaginary parts of the matrix to form a real-number matrix of size 10 × 27 × 27 × 2 as the input to the network.

2.4.2. Testing Sets

The testing sets were generated using MATLAB simulations, with the array receiving signals at incident angles ranging from 1° to 360° in 1° increments. The scenarios were divided into cases where the sound sources have identical or different frequencies and include the following three types of signal combinations:
  • Two identical sound sources (72–79 Hz): Two sound sources with identical frequencies impinge on the array from different directions. The incident angles range from 1° to 360° in 1° increments for one source and from 360° to 1° in 1° increments for the other. Noise is added to all sources.
  • Three identical sound sources (72–79 Hz): Three identical sound sources impinge on the array from different directions. The incident angles of sources 1 and 2 range from 1° to 360° and from 360° to 1°, respectively. The third source impinges from the direction of 180°. All sources operate in the frequency band of 72–79 Hz, and noise is added to simulate real-world conditions.
  • Three distinct sound sources (72–79 Hz): Three sound sources with distinct frequency bands impinge on the array from different directions. The incident angles of sources 1 and 2 range from 1° to 360° and from 360° to 1°, respectively, with source 1 covering the frequency band of 72–75 Hz and source 2 covering 76–79 Hz. The third source, spanning the frequency band of 72–79 Hz, impinges from the direction of 180°. Figure 4 shows the frequency range representation of the three sound sources. Noise is added to all sources.
In the simulation, partial frequency loss was simulated to represent scenarios where only ocean ambient noise was received, as illustrated in Figure 5. After the time-domain data of each array element were transformed into the frequency domain using Fourier transform, among the selected 10 frequency points, frequencies from 78 Hz to 72 Hz were progressively replaced with white noise to “mask” the signal. Specifically, 2, 4, and 6 frequency points were masked in different cases. In this way, in scenarios with “three distinct sound sources,” the number of frequency points consistently followed: source 2 < source 1 < source 3. When masking 4 or 6 frequency points, source 2 was effectively reduced to a single-frequency signal at 79 Hz.

3. Experiments and Result Analysis

3.1. Evaluation Method

In this study, the performance of different models is evaluated under various scenarios using two key metrics: mean absolute error (MAE) and root mean square error (RMSE). These metrics are widely used in direction-of-arrival (DOA) estimation tasks to measure the deviation between the predicted and true DOA angles, providing a quantitative assessment of model accuracy.
From the model’s prediction results, the top n angles with the highest probabilities are selected to form a prediction set. The true angles of the sound sources constitute the label set. For each true angle in the label set (denoted as j = 1, 2, …, m), the closest angle in the prediction set is chosen as the predicted DOA.
For a testing set containing K samples, each with m sound sources, the mean absolute error (MAE) is computed as follows:
M A E = 1 K · m k = 1 K j = 1 m | θ ^ k , j - θ k , j label | ,
Root mean square error (RMSE) is computed as follows:
R M S E = 1 m j = 1 m 1 K k = 1 K ( θ ^ k , j - θ k , j label ) 2 ,
where θ ^ k , j represents the predicted angle for the j -th sound source in the k -th sample, and θ k , j is the corresponding true angle.
The evaluation focuses on two primary scenarios:
  • Sound sources with identical frequencies.
  • Sound sources with different frequencies.
Additionally, real-world data from the SWellEx-96 experiment are used to validate the models on multi-frequency signal localization. The results and analyses for each scenario are detailed in the following sections.
Algorithm 2 describes the evaluation process for traditional methods (CBF/MUSIC) and the neural network (V1/V2/V3):
Algorithm 2. Evaluation Process
  • Prediction result of each sample: P ( θ ) , θ 1 , 2 , , 360
  • If CBF or MUSIC:
      Perform peak detection.
  • Select the top n values as candidate angles: Θ t o p n = arg t o p - n P ( θ )
4.
Choose the angle closest to the label as the predicted value:
              θ ^ j = arg min θ i Θ t o p n θ i θ j label
5.
Calculate MAE/RMSE

3.2. DOA Estimation for Identical Frequency Sound Sources

Testing sets 1 and 2 consist of sound sources with identical frequencies. This section analyzes the performance of V1, V2, and V3 using these testing sets. Testing set 1 contains two sound sources (m = 2), and the top two points (n = 2) with the highest probabilities from the DOA prediction results are selected to calculate the MAE and RMSE, as shown in Figure 6 and Figure 7.
Overall, the MAE and RMSE of all three models increase as the number of masked frequency points grows, when two identical sound sources are masked at the same frequency points. For SNR > −10 dB, V2 and V3 exhibit similar performance in terms of the MAE and RMSE. However, when 4 to 6 frequency points are masked, V3 experiences a more significant increase in the MAE, exceeding 0.1°, which is slightly higher than that of V2.
In scenarios with SNR < −10 dB, V1 and V2 exhibit notable increases in both the MAE and RMSE, while V3 demonstrates better performance in maintaining stability as the SNR decreases.
Testing set 2 contains three identical sound sources (m = 3). From the DOA prediction results, the top six points (n = 6) with the highest probabilities are selected to calculate the MAE and RMSE, as shown in Figure 8 and Figure 9. Similar to Figure 6 and Figure 7, the errors reach their maximum when 6 frequency points are masked.
For SNR > −15 dB, V3 demonstrates the most stable performance, with the MAE remaining below 0.4° and the RMSE remaining below 2.5°. However, V2 also performs competitively in this range, with MAE and RMSE values only slightly higher than those of V3. In scenarios with 0–2 masked frequency points, both V1 and V2 achieve low MAE values (approximately 1.5°), with V2 demonstrating better consistency as the number of masked points increases.
When 4–6 points are masked, V1 experiences a significant increase in the MAE, exceeding 2.5°, while V2 maintains lower error rates compared to V1. Compared to scenarios with two sound sources, adding a third source results in larger RMSE variations for V1 and V2, which increase with the number of masked frequency points. Notably, when four points are masked, the RMSE of V1 exceeds 5°, whereas V2 shows better adaptability and maintains a more moderate error increase.
These results indicate that for sound sources with identical frequency points, while V3 excels in handling multiple sources, V2 also demonstrates strong robustness and competitive performance, especially under lower masking conditions.
Figure 10 visualizes the DOA prediction results of V1, V2, and V3 under the condition of SNR = −20 dB, for two and three sound sources with identical frequency characteristics, when 6 frequency points are simultaneously masked. The x-axis represents the predicted DOA angles, while the y-axis corresponds to the sequence of test samples.
For both the two-source and three-source scenarios, V3 demonstrates relatively clear DOA prediction patterns. By contrast, while V1 and V2 also exhibit distinct angular variations, their predictions include spurious values at positions other than the true sources. These spurious predictions increase as the number of masked frequency points grows, introducing noise and complicating the determination of the actual source positions.

3.3. DOA Estimation for Distinct Frequency Sound Sources

In this section, the performance of V1, V2, and V3 is evaluated using testing set 3, which contains three sound sources with distinct angles and frequencies (m = 3). From the DOA prediction results, the top six points (n = 6) with the highest probabilities are selected to calculate the MAE and RMSE, as shown in Figure 11 and Figure 12.
The results indicated that the MAE and RMSE of all models are affected by the number of masked frequency points. When 4 frequency points are masked, all three models exhibit their highest MAE values. At this point, source 2 has only one usable frequency for DOA prediction, making it the weakest signal among the three sources. This has the greatest impact on V3, where the MAE exceeds 16° and the RMSE surpasses 22°.
Under higher SNR conditions (SNR > −10 dB), V2 demonstrates the best performance, maintaining an MAE below 1.5° and an RMSE under 5.2°. While V3 achieves an MAE of approximately 1 with no masked frequency points, its performance degrades significantly when frequency points are masked. Specifically, with two masked points, V3’s MAE exceeds 6°, and its RMSE surpasses 14°, making it the least robust model among the three in such scenarios.
These results highlight V2’s strong robustness and superior adaptability under higher SNR conditions, particularly in scenarios with masked frequency points. V2’s ability to handle weaker signals and maintain stable predictions makes it a better choice in multi-source environments.
Figure 13 visualizes the DOA prediction results of V1, V2, and V3 for sources 1, 2, and 3 under the condition of SNR = −20 dB. The x-axis represents the predicted DOA angles, while the y-axis corresponds to the sequence of the test data.
Overall, when frequency points are masked, the impact of frequency loss on V2’s prediction results is minimal, followed by V1, while V3 is the most affected. Specifically, V3 loses its ability to accurately predict the position of source 2 when 2 frequency points are masked. When 6 frequency points are masked, source 2 becomes entirely unobservable in V3’s predictions, as corroborated by Figure 11, indicating that V3 can no longer effectively detect source 3. By contrast, both V1 and V2 maintain relatively clear predictions for source 3, despite being affected by the masking.
Figure 14 illustrates the MAE and RMSE of three neural network models (V1, V2, and V3) and spectral estimation techniques (CBF and MUSIC) for source 2 when 4 frequency points are masked. The results show that the MAE and RMSE of V1 and V2 decrease as the SNR increases, stabilizing at SNR = −10 dB. For V2, the MAE remains close to 2°, and the RMSE stays under 15°. However, V3 exhibits a significantly different trend, with its MAE and RMSE increasing slowly as the SNR increases. Notably, V3’s MAE remains above 49°, and its RMSE exceeds 68° across all SNR conditions, far surpassing those of V2. Between the two spectral estimation techniques (CBF and MUSIC), CBF demonstrates superior performance for source 2, with an MAE below 5° at −10 dB SNR. As shown, CBF’s MAE remains double that of V2 at SNRs above −10 dB.
This indicated that V3 loses the ability to detect single-frequency targets, such as source 2, under these conditions. More detailed results are shown in Table A8 and Table A9.
Based on the above analysis, it can be concluded that when the target sound source has fewer frequencies within the detection band and broadband interfering sources are present simultaneously, the frequency-coherent network structure of V3 tends to ‘ignore’ the target source. By contrast, V1, which employs a frequency-incoherent network structure for information extraction, demonstrates strong adaptability to such scenarios. Building upon V1, V2 incorporates an attention mechanism during the prediction phase to fuse multi-frequency information, resulting in more stable predictions. This allows V2 to achieve relatively accurate DOA predictions even for sound sources with only a single frequency point.

3.4. Evaluation of DOA Models Using Data of SWellEx-96 Experiment

This study utilizes the Event-S59 data from the SWellEx-96 [29] experiment for further comparison. The SWellEx-96 experiment was conducted from 10 to 18 May 1996, approximately 12 km off Point Loma near San Diego, California. The experimental data (test data) were recorded on 13 May 1996, between 11:45 and 12:50, using the HLA North array with a sampling rate of 3267.8 Hz, under conditions with significant interference. The towed sound source emitted tones consisting of five sets of 13 tones, including a 79 Hz tone. Figure 15 shows the tracks of the sound source in the SWellEx-96 experiment and the interfering source.
The HLA North array is a horizontal array with a 240 m aperture deployed on the seafloor. The bearing from the first to the last array element was oriented 34.5 degrees clockwise from true North. The array elements were arranged in a slightly bow-shaped configuration, as illustrated in Figure 16.
This study performs DOA estimation for the first 60 min of Event-S59 data using three neural network models (V1, V2, and V3) as well as traditional methods, including CBF [30] and MUSIC [31]. The data were sampled at 3276.8 Hz with a frequency resolution of 0.8 Hz, covering a frequency band of 72–79 Hz. Within this band, the towed source had a single frequency point at 79 Hz, while the interfering source spanned the entire band of 72–79 Hz. The results are shown in Figure 17 and Figure 18, where green triangles represent the trajectory of the towed source, and red circles indicate the trajectory of the interfering source.
Figure 17 presents the DOA estimation results of V1, V2, V3, and CBF without frequency masking, using a total of 10 frequency points. Among the models, V2 achieves the best performance, with the towed source’s trajectory appearing the clearest and most continuous in Figure 17b. V1 follows as the second-best model. By contrast, V3 demonstrates the poorest performance, as shown in Figure 17c, where the trajectories of both the towed and interfering sources are the least distinct.
Although CBF provides the trajectory of the sound sources, it exhibits significant sidelobes and strong spurious peaks (mirror peaks) at angles symmetrical about the end fire direction.
Figure 18 presents the DOA prediction results of V1, V2, and V3 under the conditions where 6 frequency points are masked, alongside the results of MUSIC without frequency masking. From Figure 18a,c it can be observed that, compared to the unmasked condition, V1 demonstrates a more continuous and clearer trajectory for the target source than V3. However, the results of V1 contain a higher number of spurious points. By contrast, Figure 18b shows that V2 produces the fewest spurious points among the three networks, making it the best-performing model overall.
In Figure 18d, MUSIC, which performs DOA estimation using 10 snapshots, provides a very clear trajectory for the interfering source. However, it struggles to balance the relationship between the target source and the interfering source. Additionally, MUSIC fails to localize the towed source effectively when it is in the end fire direction of the array.
These results are consistent with the simulation findings, further indicating that the frequency-coherent V3 network performs poorly in localizing the single-frequency towed source (79 Hz) under broadband interference. The performance of V3 improves only when certain characteristic frequencies of the interfering source are replaced with white noise. By contrast, V2 demonstrates robust detection capability for single-frequency target sources within the frequency band. Regardless of changes in the frequency of the interfering source, V2 effectively balances the detection of single-frequency target sources with other interfering signals.

4. Conclusions

To address the limitations of existing frequency-coherent neural network structures in estimating the direction of arrival (DOA) of single-frequency sound sources under underwater broadband interference, this paper proposes the Parallel Net model. The model employs parallel GFGRU networks to independently extract information from each frequency point in a decoupled manner, followed by the use of an attention mechanism for multi-frequency information fusion.
Through simulations and analyses of array-received data from the SWellEx-96 experiment, with partial frequencies replaced by white noise, the DOA estimation performance of the proposed model and existing models was evaluated. The simulations considered scenarios with two or three sound sources of identical frequencies and three sound sources with distinct frequencies arriving from different angles.
The results demonstrate that Parallel Net improves DOA estimation accuracy for single-frequency sound sources under broadband interference compared to frequency-coherent neural network methods. Specifically, when the SNR > −10 dB, the MAE for single-frequency sources remains below 2°, outperforming frequency-coherent neural networks and reaching only half of the error of CBF. Validation using the SWellEx-96 experiment further confirmed the model’s robustness in detecting single-frequency targets under broadband interference. Parallel Net exhibits superior sidelobe suppression and fewer spurious peaks compared to CBF, achieves higher detection accuracy than MUSIC, and produces smoother and more continuous DOA trajectories than conventional neural network models.
Significantly, we observed that CBF delivers markedly more stable predictions for the 79 Hz single-frequency source 2 under challenging low-SNR conditions (<−10 dB). By contrast, MUSIC exhibits distinct advantages in suppressing mirror peaks. Future research will focus on developing a hybrid framework that integrates conventional methods (CBF and MUSIC) with neural networks to synergistically fuse their predictions, thereby enhancing source detection performance in low-SNR conditions and complex acoustic environments.

Author Contributions

These authors contributed equally to this work: Z.Y. and X.Z.; Conceptualization, Z.Y. and M.C.; methodology, Z.Y.; software, Z.Y., X.Z. and M.C.; validation, Z.Y., M.C. and X.Z.; formal analysis, Z.Y. and X.L.; writing—original draft preparation, Z.Y. and X.L.; writing—review and editing, Z.L.; visualization, Z.Y.; supervision, T.S. and Z.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Part of the code and the raw data that have been analyzed in the manuscript are available in https://blog.csdn.net/YANGN1?type=blog (accessed on 20 August 2024).

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

Appendix A.1. Gating Mechanism of GFGRU

The gating mechanism of the GFGRU can be expressed by the following equations:
z t = σ ( W z x t + U z h t 1 ) r t = σ ( W r x t + U r h t 1 ) h ~ t j = t a n h ( W j 1 j h t j 1 + r t j i = 1 L g i j U i j h t 1 i ) h t = 1 z t + z t h ~ t

Appendix A.2. Structure of GFGRU

Figure A1 shows the cell of the GFGRU at time T = t, which consists of two layers.
Figure A1. Structure of GFGRU’s unit at time T = t.
Figure A1. Structure of GFGRU’s unit at time T = t.
Jmse 13 00724 g0a1

Appendix A.3. Complexity Evaluations for Difference Methods

Table A1 shows the complexity evaluation of different methods in the paper. The calculation amount (MACs) and parameter number of the model are included. Additionally, for an intuitive comparison of the three models, Table A1 provides the running time on an RTX 960 m GPU (with 4 GB of memory) and the maximum batch size under the 4 GB memory constraint. Since V1 and V2 involve a loop for each frequency point, their processing times are almost 10 times that of V3.
Table A1. Complexity evaluations for difference methods.
Table A1. Complexity evaluations for difference methods.
ModelsMACs(G)Params(M)Running TimeMax Batch Size
V12.73167.61.01 s120
V22.73167.61.01 s120
V31.28117.70.12 s1600

Appendix B

Visualization of DOA Estimation Results: CBF vs. MUSIC

Figure A2 and Figure A3 visualize the DOA estimation results of CBF and MUSIC at SNR = −20 dB in the simulation. They display cases with 0 and 6 masked frequency points. The scenarios include: “two identical sound sources”, “three identical sound sources”, and “three distinct sound sources”. The vertical axis represents sample indices, while the horizontal axis denotes angle values.
Under −20 dB SNR conditions, CBF demonstrates stable source 2 detection regardless of frequency masking, although it is significantly affected by mirror peaks. By contrast, while MUSIC exhibits less severe mirror-peak effects, it completely loses source 2 detection capability when 6 frequency points are masked.
Figure A2. CBF’s DOA estimation for two and three sound sources with 0 or 6 frequency points Masked at SNR = −20 dB.
Figure A2. CBF’s DOA estimation for two and three sound sources with 0 or 6 frequency points Masked at SNR = −20 dB.
Jmse 13 00724 g0a2
Figure A3. MUSIC’s DOA estimation results for two and three sound sources with 0 or 6 frequency points masked at SNR = −20 dB.
Figure A3. MUSIC’s DOA estimation results for two and three sound sources with 0 or 6 frequency points masked at SNR = −20 dB.
Jmse 13 00724 g0a3

Appendix C

MAE and RMSE of Each Model

Table A2, Table A3, Table A4, Table A5, Table A6, Table A7, Table A8 and Table A9 present the MAE and RMSE of each model in the simulation tests from Section 3.2 and Section 3.3. Specifically, Table A2, Table A3 and Table A4 show the MAE of each model, while Table A4, Table A5 and Table A6 show the RMSE of each model. In particular, Table A8 and Table A9 provide the MAE and RMSE of each model for the prediction of sound source 2 under three sound sources with distinct frequency bands in Section 3.3.
Table A2. MAE vs. SNR for two identical sources with frequency masking: V1, V2, and V3.
Table A2. MAE vs. SNR for two identical sources with frequency masking: V1, V2, and V3.
SNR−20−15−10−50510
Mask0246024602460246024602460246
V10.761.271.542.160.370.440.511.450.280.330.351.500.280.280.310.480.300.280.320.500.280.310.320.460.290.320.330.37
V20.530.710.781.310.180.220.280.360.070.090.120.130.040.060.060.080.040.040.050.060.030.030.050.060.030.040.040.05
V30.210.280.260.750.10.130.190.290.070.080.140.170.060.080.120.160.060.090.130.140.070.080.130.130.070.080.130.14
The bolded values represent the minimum values.
Table A3. MAE vs. SNR for three identical sources with frequency masking: evaluation of various methods.
Table A3. MAE vs. SNR for three identical sources with frequency masking: evaluation of various methods.
SNR−20−15−10−50510
Mask0246024602460246024602460246
V12.813.295.287.651.591.31.93.460.390.450.982.610.350.621.192.220.720.451.032.140.710.70.922.360.710.511.192.81
V22.22.684.285.180.761.061.151.880.310.450.670.870.330.330.871.070.290.50.790.890.320.40.951.140.360.411.020.78
V30.190.250.361.010.130.150.190.380.140.140.160.30.130.140.140.210.130.130.120.190.140.140.120.190.140.140.120.18
CBF2.472.362.292.282.542.552.472.172.252.392.412.122.462.142.241.982.442.422.442.152.462.422.452.002.462.412.442.01
MUSIC2.062.172.432.572.412.102.232.462.052.412.502.562.052.142.112.231.952.162.362.512.011.982.122.422.042.062.092.63
The bolded values represent the minimum values.
Table A4. MAE vs. SNR for three distinct sound sources with frequency masking: evaluation of various methods.
Table A4. MAE vs. SNR for three distinct sound sources with frequency masking: evaluation of various methods.
SNR−20−15−10−50510
Mask0246024602460246024602460246
V12.744.0312.88.920.430.987.845.620.170.595.994.650.130.194.813.780.130.154.883.740.120.225.143.90.130.174.924.09
V21.051.8611.66.90.080.123.512.020.020.030.740.380.010.020.410.530.010.010.660.470.010.010.430.280.010.020.620.12
V32.756.916.614.91.986.2216.514.91.765.7116.914.62.385.691713.92.015.7317.114.81.685.4917.714.71.965.6517.515.1
CBF1.871.761.701.661.911.931.861.551.641.761.801.511.831.521.641.371.821.801.831.531.851.801.841.391.851.801.841.39
MUSIC2.694.5110.79.151.632.359.016.801.441.608.616.171.381.567.695.931.101.248.125.011.011.127.765.561.061.467.555.32
The bolded values represent the minimum values.
Table A5. RMSE vs. SNR for two identical sources with frequency masking: V1, V2, and V3.
Table A5. RMSE vs. SNR for two identical sources with frequency masking: V1, V2, and V3.
SNR−20−15−10−50510
Mask0246024602460246024602460246
V13.197.065.959.890.931.151.828.180.760.850.9011.60.780.780.921.960.810.750.922.560.830.860.931.670.850.880.940.96
V21.523.053.086.60.550.750.911.150.330.380.420.470.20.250.260.310.20.210.220.250.170.180.220.270.160.190.200.23
V30.591.050.644.680.320.40.510.680.260.290.430.460.240.280.410.450.240.320.380.420.270.290.390.40.260.270.380.45
The bolded values represent the minimum values.
Table A6. RMSE vs. SNR for three identical sources with frequency masking: evaluation of various methods.
Table A6. RMSE vs. SNR for three identical sources with frequency masking: evaluation of various methods.
SNR−20−15−10−50510
Mask0246024602460246024602460246
V114.316.122.725.112.89.0111.616.63.722.827.513.43.054.567.1411.56.013.167.47126.015.285.7212.26.013.437.4513.9
V211.714.320.722.45.977.217.3313.12.252.975.576.732.442.357.67102.263.917.277.722.52.98.329.922.732.948.357.31
V30.741.182.056.020.560.50.722.30.750.630.771.430.730.720.570.890.730.560.40.780.730.720.390.910.740.720.40.81
CBF9.587.86.556.6410.610.610.15.518.79.519.365.5810.17.538.283.6610.110.010.05.6810.11010.13.8210.110.010.13.84
MUSIC5.998.118.517.959.496.497.378.036.099.3310.39.286.316.786.336.356.097.48.679.046.86.597.218.686.796.917.728.56
The bolded values represent the minimum values.
Table A7. RMSE vs. SNR for three distinct sound sources with frequency masking: evaluation of various methods.
Table A7. RMSE vs. SNR for three distinct sound sources with frequency masking: evaluation of various methods.
SNR−20−15−10−50510
Mask0246024602460246024602460246
V113.816.323.4212.686.8517.515.30.774.71512.60.440.9413.511.40.430.4413.310.80.411.2613.811.90.440.7713.311.7
V27.678.7522.518.60.360.6211.37.610.150.185.183.340.10.113.514.270.070.084.523.680.10.093.982.960.110.124.811.17
V313.515.423.222.810.61522.923.11114.522.922.211.71423.321.710.714.423.122.69.4114.123.922.410.414.523.723
CBF9.27.46.26.310.210.39.75.18.39.19.05.29.77.27.93.39.79.79.75.39.79.79.73.59.79.79.73.5
MUSIC9.611.816.717.27.89.017.413.77.27.015.313.76.86.413.813.04.55.115.211.54.24.813.512.44.46.013.311.2
The bolded values represent the minimum values.
Table A8. MAE vs. SNR of sound source 2 for three distinct sound sources with frequency masking: evaluation of various methods.
Table A8. MAE vs. SNR of sound source 2 for three distinct sound sources with frequency masking: evaluation of various methods.
SNR−20−15−10−50510
Mask0246024602460246024602460246
V14.29.737.724.00.42.423.215.90.21.517.713.40.20.414.210.90.10.214.410.80.10.415.211.10.20.314.511.9
V21.445.0334.519.70.10.1910.45.880.030.032.151.0400.011.21.54001.931.330.0101.250.750.010.011.810.3
V35.4520.349.644.14.7418.549.544.33.71750.543.56.1116.950.841.64.9517.151.444.23.8716.453444.7316.852.645.1
CBF2.23.817.363.012.12.664.412.362.053.314.752.472.163.544.762.412.183.234.82.382.163.114.662.422.193.294.682.32
MUSIC3.3610.930.123.61.624.9324.818.11.942.924.116.71.422.9421.415.81.161.6822.412.91.011.4621.814.81.072.4120.914
The bolded values represent the minimum values.
Table A9. RMSE vs. SNR of sound source 2 for three distinct sound sources with frequency masking: evaluation of various methods.
Table A9. RMSE vs. SNR of sound source 2 for three distinct sound sources with frequency masking: evaluation of various methods.
SNR−20−15−10−50510
Mask0246024602460246024602460246
V121.234.967.248.41.816.451.540.30.612.844.235.80.52.139.533.00.50.638.931.40.52.140.632.00.51.539.134.2
V29.5623.966.149.60.461.2333.322.10.170.2215.29.50.050.0910.312.50.050.0513.410.70.07011.78.540.070.0714.23.18
V323.844.868.466.624.644.368.366.721.64368.566.127.841.369.664.624.842.66967.420.741.971.466.823.74370.968.4
CBF9.4613.419.112.19.3111.914.210.58.8714.315.81110.515.516.510.710.514.316.610.510.313.915.810.710.714.315.810.2
MUSIC12.726.244.638.28.021941.333.211.813.938.733.67.861436.732.25.887.7138275.067.1837.7315.8711.636.228.5
The bolded values represent the minimum values.

References

  1. Bartlett, M.S. Properties of Sufficiency and Statistical Tests. Proc. R. Soc. Lond. 1937, 160, 268–282. [Google Scholar] [CrossRef]
  2. Niu, H.; Reeves, E.; Gerstoft, P. Source Localization in an Ocean Waveguide Using Supervised Machine Learning. J. Acoust. Soc. Am. 2017, 142, 1176–1188. [Google Scholar] [PubMed]
  3. Huang, Z.; Xu, J.; Gong, Z.; Wang, H.; Yan, Y. Source Localization Using Deep Neural Networks in a Shallow Water Environment. J. Acoust. Soc. Am. 2018, 143, 2922–2932. [Google Scholar] [CrossRef] [PubMed]
  4. Waibel, A.; Hanazawa, T.; Hinton, G.; Shikano, K.; Lang, K.J. Phoneme Recognition Using Time-Delay Neural Networks. IEEE Trans. Acoust. 1989, 37, 328–339. [Google Scholar] [CrossRef]
  5. Huang, Z.; Xu, J.; Gong, Z.; Wang, H.; Yan, Y. Multiple Source Localization in a Shallow Water Waveguide Exploiting Subarray Beamforming and Deep Neural Networks. Sensors 2019, 19, 4768. [Google Scholar] [CrossRef] [PubMed]
  6. Szegedy, C.; Vanhoucke, V.; Ioffe, S.; Shlens, J.; Wojna, Z. Rethinking the Inception Architecture for Computer Vision. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 2818–2826. [Google Scholar]
  7. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
  8. Liu, Y.-N.; Niu, H.-Q.; Li, Z.-L. Source Ranging Using Ensemble Convolutional Networks in the Direct Zone of Deep Water. Chin. Physics Lett. 2019, 36, 044302. [Google Scholar] [CrossRef]
  9. Niu, H.; Gong, Z.; Ozanich, E.; Gerstoft, P.; Wang, H.; Li, Z. Deep-Learning Source Localization Using Multi-Frequency Magnitude-Only Data. J. Acoust. Soc. Am. 2019, 146, 211–222. [Google Scholar] [CrossRef] [PubMed]
  10. Liu, Y.; Niu, H.; Li, Z. A Multi-Task Learning Convolutional Neural Network for Source Localization in Deep Ocean. J. Acoust. Soc. Am. 2020, 148, 873–883. [Google Scholar] [PubMed]
  11. Chung, J.; Gulcehre, C.; Cho, K.; Bengio, Y. Gated feedback recurrent neural networks. In Proceedings of the International conference on machine learning, Lille, France, 6–11 July 2015; pp. 2067–2075. [Google Scholar]
  12. Liu, Y.; Niu, H.; Yang, S.; Li, Z. Multiple Source Localization Using Learning-Based Sparse Estimation in Deep Ocean. J. Acoust. Soc. Am. 2021, 150, 3773–3786. [Google Scholar] [CrossRef] [PubMed]
  13. Xiao, X.; Wang, W.; Ren, Q.; Gerstoft, P.; Ma, L. Underwater Acoustic Target Recognition Using Attention-Based Deep Neural Network. JASA Express Lett. 2021, 1, 106001. [Google Scholar] [CrossRef] [PubMed]
  14. Xiao, X.; Wang, W.; Ren, Q.; Zhao, M.; Ma, L. Source Ranging Using Attention-Based Convolutional Neural Network. In Proceedings of the 2021 OES China Ocean Acoustics (COA), Harbin, China, 14–17 July 2021; pp. 1038–1042. [Google Scholar]
  15. Xie, Y.; Wang, B. Direction-of-Arrival Estimation Method Based on Neural Network with Temporal Structure for Underwater Acoustic Vector Sensor Array. Sensors 2023, 23, 4919. [Google Scholar] [CrossRef] [PubMed]
  16. Ozanich, E.; Gerstoft, P.; Niu, H. A Feedforward Neural Network for Direction-of-Arrival Estimation. J. Acoust. Soc. Am. 2020, 147, 2035–2048. [Google Scholar] [CrossRef] [PubMed]
  17. Li, P.; Tian, Y. DOA Estimation of Underwater Acoustic Signals Based on Deep Learning. In Proceedings of the 2021 2nd International Seminar on Artificial Intelligence, Networking and Information Technology (AINIT), Shanghai, China, 15–17 October 2021; pp. 221–225. [Google Scholar]
  18. Li, X.; Chen, J.; Bai, J.; Ayub, M.S.; Zhang, D.; Wang, M.; Yan, Q. Deep Learning-Based DOA Estimation Using CRNN for Underwater Acoustic Arrays. Front. Mar. Sci. 2022, 9, 1027830. [Google Scholar] [CrossRef]
  19. Jesus, S.M. Broadband Matched-Field Processing of Transient Signals in Shallow Water. J. Acoust. Soc. Am. 1993, 93, 1841–1850. [Google Scholar] [CrossRef]
  20. He, J.; Zhang, B.; Liu, P.; Li, X.; Wang, L.; Tang, R. Effective Underwater Acoustic Target Passive Localization of Using a Multi-Task Learning Model with Attention Mechanism: Analysis and Comparison under Real Sea Trial Datasets. Appl. Ocean Res. 2024, 150, 104072. [Google Scholar] [CrossRef]
  21. Wu, L.; Fu, Y.; Yang, X.; Xu, L.; Chen, S.; Zhang, Y.; Zhang, J. Research on the Multi-Signal DOA Estimation Based on ResNet with the Attention Module Combined with Beamforming (RAB-DOA). Appl. Acoust. 2025, 231, 110541. [Google Scholar] [CrossRef]
  22. Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet Classification with Deep Convolutional Neural Networks. Commun. ACM 2017, 60, 84–90. [Google Scholar] [CrossRef]
  23. Xie, S.; Girshick, R.; Dollar, P.; Tu, Z.; He, K. Aggregated Residual Transformations for Deep Neural Networks. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 1492–1500. [Google Scholar]
  24. Michalopoulou, Z.-H.; Porter, M.B. Matched-Field Processing for Broad-Band Source Localization. IEEE J. Ocean. Eng. 1996, 21, 384–392. [Google Scholar] [CrossRef]
  25. Hu, J.; Shen, L.; Albanie, S.; Sun, G.; Wu, E. Squeeze-and-Excitation Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 42, 2011–2023. [Google Scholar] [CrossRef] [PubMed]
  26. Cho, K.; van Merrienboer, B.; Bahdanau, D.; Bengio, Y. On the Properties of Neural Machine Translation: Encoder-Decoder Approaches. arXiv 2014, arXiv:1409.1259. [Google Scholar]
  27. Hastie, T.; Tibshirani, R.; Friedman, J.H. The Elements of Statistical Learning, 2nd ed.; Springer: New York, NY, USA, 2009. [Google Scholar]
  28. Bishop, C.M.; Nasrabadi, N.M. Pattern Recognition and Machine Learning; Springer: New York, NY, USA, 2006. [Google Scholar]
  29. Murray, J. The SWellEx-96 Experiment. Available online: https://swellex96.ucsd.edu/events.htm (accessed on 2 August 2024).
  30. DeFatta, D.J.; Lucas, J.G.; Hodgkiss, W.S. Digital Signal Processing: A System Design Approach; John Wiley & Sons: New York, NY, USA, 1988. [Google Scholar]
  31. Van Trees, H.L. Optimum Array Processing: Part IV of Detection, Estimation, and Modulation Theory; John Wiley & Sons: New York, NY, USA, 2002. [Google Scholar]
Figure 1. Signal incidence for array reception.
Figure 1. Signal incidence for array reception.
Jmse 13 00724 g001
Figure 2. Architecture of Parallel Net.
Figure 2. Architecture of Parallel Net.
Jmse 13 00724 g002
Figure 3. Details of V1, V2, and V3 network models.
Figure 3. Details of V1, V2, and V3 network models.
Jmse 13 00724 g003
Figure 4. Frequency range representation of three sound sources in testing set 3.
Figure 4. Frequency range representation of three sound sources in testing set 3.
Jmse 13 00724 g004
Figure 5. Illustration of masked frequency points.
Figure 5. Illustration of masked frequency points.
Jmse 13 00724 g005
Figure 6. MAE vs. SNR for two identical sources with frequency masking: V1, V2, and V3.
Figure 6. MAE vs. SNR for two identical sources with frequency masking: V1, V2, and V3.
Jmse 13 00724 g006
Figure 7. RMSE vs. SNR for two identical sources with frequency masking: V1, V2, and V3.
Figure 7. RMSE vs. SNR for two identical sources with frequency masking: V1, V2, and V3.
Jmse 13 00724 g007
Figure 8. MAE vs. SNR for three identical sources with frequency masking: V1, V2, and V3.
Figure 8. MAE vs. SNR for three identical sources with frequency masking: V1, V2, and V3.
Jmse 13 00724 g008
Figure 9. RMSE vs. SNR for three identical sources with frequency masking: V1, V2, and V3.
Figure 9. RMSE vs. SNR for three identical sources with frequency masking: V1, V2, and V3.
Jmse 13 00724 g009
Figure 10. DOA prediction results for two and three identical sources with 6 frequency points masked at SNR = −20 dB: V1, V2, and V3.
Figure 10. DOA prediction results for two and three identical sources with 6 frequency points masked at SNR = −20 dB: V1, V2, and V3.
Jmse 13 00724 g010
Figure 11. MAE vs. SNR for three distinct sound sources with frequency masking: V1, V2, and V3.
Figure 11. MAE vs. SNR for three distinct sound sources with frequency masking: V1, V2, and V3.
Jmse 13 00724 g011
Figure 12. RMSE vs. SNR for three distinct sound sources with frequency masking: V1, V2, and V3.
Figure 12. RMSE vs. SNR for three distinct sound sources with frequency masking: V1, V2, and V3.
Jmse 13 00724 g012
Figure 13. DOA prediction results for three sources with different frequencies and 2 or 6 frequency points masked at SNR = −20 dB: V1, V2, and V3.
Figure 13. DOA prediction results for three sources with different frequencies and 2 or 6 frequency points masked at SNR = −20 dB: V1, V2, and V3.
Jmse 13 00724 g013
Figure 14. MAE and RMSE vs. SNR of sound source 2 for three distinct sound sources with 4 frequency points masked: evaluation of various methods.
Figure 14. MAE and RMSE vs. SNR of sound source 2 for three distinct sound sources with 4 frequency points masked: evaluation of various methods.
Jmse 13 00724 g014
Figure 15. Trajectories of the SWellEx-96 experiment: Event-S59 sound source and interfering source. The green pentagram at the center marks the location of HLA North Array.
Figure 15. Trajectories of the SWellEx-96 experiment: Event-S59 sound source and interfering source. The green pentagram at the center marks the location of HLA North Array.
Jmse 13 00724 g015
Figure 16. HLA North array location in the SWellEx-96 Experiment.
Figure 16. HLA North array location in the SWellEx-96 Experiment.
Jmse 13 00724 g016
Figure 17. DOA prediction results for V1, V2, V3, and CBF without frequency masking.
Figure 17. DOA prediction results for V1, V2, V3, and CBF without frequency masking.
Jmse 13 00724 g017
Figure 18. DOA prediction results for V1, V2, and V3 with 6 frequency points masked, and MUSIC without frequency masking.
Figure 18. DOA prediction results for V1, V2, and V3 with 6 frequency points masked, and MUSIC without frequency masking.
Jmse 13 00724 g018
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Yang, Z.; Zhang, X.; Luo, Z.; Shen, T.; Cui, M.; Li, X. Parallel Net: Frequency-Decoupled Neural Network for DOA Estimation in Underwater Acoustic Detection. J. Mar. Sci. Eng. 2025, 13, 724. https://doi.org/10.3390/jmse13040724

AMA Style

Yang Z, Zhang X, Luo Z, Shen T, Cui M, Li X. Parallel Net: Frequency-Decoupled Neural Network for DOA Estimation in Underwater Acoustic Detection. Journal of Marine Science and Engineering. 2025; 13(4):724. https://doi.org/10.3390/jmse13040724

Chicago/Turabian Style

Yang, Zhikai, Xinyu Zhang, Zailei Luo, Tongsheng Shen, Mengda Cui, and Xionghui Li. 2025. "Parallel Net: Frequency-Decoupled Neural Network for DOA Estimation in Underwater Acoustic Detection" Journal of Marine Science and Engineering 13, no. 4: 724. https://doi.org/10.3390/jmse13040724

APA Style

Yang, Z., Zhang, X., Luo, Z., Shen, T., Cui, M., & Li, X. (2025). Parallel Net: Frequency-Decoupled Neural Network for DOA Estimation in Underwater Acoustic Detection. Journal of Marine Science and Engineering, 13(4), 724. https://doi.org/10.3390/jmse13040724

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop