Next Article in Journal
Impact of Long-Term Drought on Surface Water and Water Balance Variations in Iran: Insights from Highland and Lowland Regions
Next Article in Special Issue
Deterministic Sea Wave Reconstruction and Prediction Based on Coherent S-Band Radar Using Condition Number Regularized Least Squares
Previous Article in Journal / Special Issue
Robust Underwater Direction-of-Arrival Estimation Method Using Acoustic Sensor Array under Unknown Swing Deviation Elements
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

TF-REF-RNN: Time-Frequency and Reference Signal Feature Fusion Recurrent Neural Network for Underwater Backscatter Signal Separation

by
Jun Liu
*,
Shenghua Gong
,
Tong Zhang
,
Zhenxiang Zhao
,
Hao Dong
and
Jie Tan
School of Electronic Information Engineering, Beihang University, Beijing 100083, China
*
Author to whom correspondence should be addressed.
Remote Sens. 2024, 16(19), 3635; https://doi.org/10.3390/rs16193635
Submission received: 28 August 2024 / Revised: 24 September 2024 / Accepted: 27 September 2024 / Published: 29 September 2024

Abstract

:
Underwater wireless sensor networks play an important role in exploring the oceans as part of an integrated space–air–ground–ocean network. Because underwater energy is limited, the equipment’s efficiency is significantly impacted by the battery duration. Underwater backscatter technology does not require batteries and has significant potential in positioning, navigation, communication, and sensing due to its passive characteristics. However, underwater backscatter signals are susceptible to being swamped by the excitation signal. Additionally, the signals from different reflection signals share the same frequency and overlap, and contain fewer useful features, leading to significant challenges in detection. In order to solve the above problems, this paper proposes a recurrent neural network that introduces time-frequency and reference signal features for underwater backscatter signal separation (TF-REF-RNN). In the feature extraction part, we design an encoder that introduces time-frequency domain features to learn more about the frequency details. Additionally, to improve performance, we designed a separator that incorporates the reference signal’s pure global information features. The proposed TF-REF-RNN network model achieves metrics of 28.55 dB SI-SNRi and 19.51 dB SDRi in the dataset that includes shipsEar noise data and underwater simulated backscatter signals, outperforming similar classical methods.

1. Introduction

According to Figure 1, the integrated space–air–ground–ocean network can provide newer service paradigms, realize a global network with no dead-end coverage, and integrate multidimensional heterogeneous resources. The ocean provides valuable resources, including minerals, renewable energy, medicine, and fresh water. The study and exploration of the ocean is highly significant for human development [1]. Underwater wireless sensor network technology has significantly expedited researchers and scholars in their exploration of the ocean by enabling real-time monitoring of marine environmental parameters and recording of marine biological information [2]. The network’s sensor nodes can gather and analyze data based on specific application scenarios. They then transmit the results to the underwater autonomous vehicle or base station for further investigation using long-distance communication [3]. However, the complex marine environment poses significant challenges for the normal operation of underwater sensors. The process of replacing energy underwater is expensive and challenging, and the effectiveness of underwater devices in gathering environmental data, communication, localization, and navigation is heavily dependent on the durability of the battery. The advent of backscatter technology somewhat resolves this issue [4]. Backscatter technology is a passive technique that can communicate by reflecting existing signals in the environment. Therefore, the backscatter device does not require battery replacement during normal operation, which greatly extends the service life and cycle of the device. At the same time, backscatter devices have low costs and can be deployed on a large scale, which has great potential for application in the field of underwater sensor networks.
The concept of backscatter communication was initially introduced in 1948. This technology enables tags to communicate by reflecting incident RF signals without the need to actively generate their own signals [5]. The execution method of a system utilizing backscatter technology consists of three distinct phases: energy collection, data reflection transmission, and information demodulation [6]. Backscatter systems can be classified into three categories based on distinct system architectures: single-base systems, dual-base systems, and environmental backscatter systems [7]. Due to its multiple application scenarios, multiple operating frequencies, long working distances, flexible data transmission, and sustainable working methods, environmental backscatter systems have aroused great interest among scientists [8]. Backscatter communication techniques have been extensively researched on land, focusing on aspects such as communication, coding, security, and operating distance. The literature [9] studies mostly focus on the dual-base scenario, which involves the introduction of artificial noise into the incident carrier to enable clandestine communication in backscatter systems. In the context of backscatter systems, the use of efficient channel coding and decoding algorithms is crucial in order to achieve the desired communication objectives. The analysis conducted in [10] examines the constraints associated with current backscatter-coded communication approaches. The bit error rate and the occurrence of packet loss in backscatter signals rise significantly as the operating distance extends. In the literature [11], a proposal is made for a remote ultra-low-power demodulator that can effectively demodulate signals from remote nodes while consuming moderate power. This demodulator also exhibits improved performance in terms of gain within the demodulation range. The literature [12] suggests a collaboration-centric approach to spectrum sharing in order to allocate resources and associate users within large-scale backscatter vehicular networks.
Underwater backscatter, a nascent technique, has undergone extensive research and achieved significant advancements in recent years. Compared to terrestrial environments, electromagnetic waves experience more severe attenuation in underwater environments. Therefore, communication in water mostly relies on acoustic waves [13]. In [14], the concept of the underwater backscatter node was firstly introduced. This node utilizes the piezoelectric effect to gather acoustic energy from the underwater acoustic environment. This energy is then used to drive a microprocessor-controlled switch, allowing for the manipulation of signals by either reflecting or not reflecting them. The literature [15] furthers the team’s research by conducting a study on the implementation of new backscatter nodes. It replicates the operational range and energy dissipation of an underwater backscatter node using the parameters of a real transducer. The study in [16] examined the use of lightweight deep neural networks and empirically confirmed the practicality of employing small machine learning networks in underwater backscatter nodes. A method for sending small-size images utilizing underwater backscatter nodes was proposed in the literature [17], and hardware circuits were created to verify the method. The study conducted in the literature [18] thoroughly examined the process of signal transmission in underwater backscatter and successfully addressed the issue of estimating the underwater channel by employing the gradient projection algorithm with sparse reconstruction. The literature [19] examined the operational constraints of the backscatter nodes by developing a theoretical model for link analysis. It also investigated the energy harvesting in the downlink and the signal communication process in the uplink at a macroscopic level. The study [20] examined how multipath, latency, and mobility affect the performance of underwater backscatter nodes in terms of their suitability for positioning and navigation purposes.
With the development of underwater backscatter technology, significant progress has been made in communication distance, throughput, underwater acoustic channel research, and information collection. At present, passive communication with a throughput of 20 kbps at a distance of 30 m has been achieved. The methods and contents of information collection have been qualitatively improved. In terms of positioning, underwater backscatter equipment was used to achieve one-dimensional centimeter level positioning.
The recent development of underwater backscatter technology offers greater potential for the advancement of underwater IoT technology, particularly in the domains of underwater mobile device positioning and navigation, as well as oceanic information detection and gathering. Underwater mobile robots can benefit from the low cost and quick deployment of nodes to obtain location information, which aids in localization and navigation. Furthermore, the excitation source can be continuously activated to capture sensor information from a considerable distance without requiring active measures. This effectively minimizes the expenses associated with a range of underwater devices [21].
Establishing a stable communication link with underwater backscatter nodes requires ensuring the correct decoding of underwater backscatter signals. And decoding presupposes separating the signal from the complex hydroacoustic environment. However, hydroacoustic signals are often mixed with a lot of noise and interference because of the complexity of the underwater environment, making signal recognition challenging. To achieve robust signal decoding, it is necessary to separate the target signal from the received mixed signal [22].
In addition to the complex channel characteristics of hydroacoustic signals, underwater backscattered signals also have to deal with signal homogeneity among multiple reflected signals and significant flooding by the excitation signal. Because the signals reflected from several backscatter nodes have the same frequency and overlap, there are very few usable properties.
The major goal of this work is to separate the reflected signals from various nodes and remove the excitation signal from the reflected signal in order to obtain better detection. The primary contributions of this paper are as follows: to investigate the mechanism of underwater backscatter communication, to introduce a theoretical model of backscatter signal separation for the first time using the blind source separation technique, to implicitly estimate the channel features using the deep learning method, and to convert the problem of detecting multiple reflected signals into a signal separation problem. In this paper, TF-REF-RNN is built as an underwater backscatter signal separation network that introduces time-frequency and reference signal features. In the feature extraction part, an encoder that prioritizes the frequency information is designed, which improves the model’s ability to extract features for the backscatter node’s operating frequency. Additionally, in order to help the network extract global characteristics and enhance the separation of numerous signal sources, this research offers a backscatter signal separator that introduces a reference signal. Based on the shipsEar [23] and underwater simulated backscatter signal dataset, we performed separation studies of two reflected signals in the noisy data. According to the results, the separated signals in the proposed network structure minimize the interference of excitation and noise signals to obtain higher performance. The network structure achieves results of 19.51 dB and 28.55 dB in SI-SNRi and SDRi metrics, respectively, outperforming equivalent classic methods.
The remainder of this article is organized as follows. Section 2 introduces the related work in this field. Section 3 describes the problem in the paper. Section 4 gives details on the TF-REF-RNN model. Section 5 introduces the experimental setup parameters. Section 6 analyzes and discusses the experimental results. Finally, the article is concluded in Section 7.

2. Related Work

2.1. Source Separation Method

The separation and extraction of target source signals from mixed signals is a fundamental task in the field of signal processing known as source separation. The primary challenges that arise when conducting comparable source separations are the differentiation of source features, the alignment of source predictions, and the resistance to noise.
Classic methods for audio separation, like Deep Clustering [24], allocate embedding components to each time-frequency region of the spectrogram. This allows for the implicit prediction of separation labels for the target spectrum from the mixed audio. It has proven to be effective in tests involving two or three sources, particularly when no prior information about the source is available. Conv-TasNet [25] comprises three stages: an encoder, a separator, and a decoder. The signal features are extracted by applying 1D convolution to the input time-domain signal. Meanwhile, the network employs a masking strategy to separate the signals based on the mask of each source. DPRNN [26] employs a method wherein the input sequence is divided into multiple segments, which are then processed by RNN to extract contextual features. This approach allows for the improved learning of audio features by utilizing residual concatenation. DPRNN comprises an encoder, a dual-path RNN module, and a decoder. The encoder converts the input spectrogram into a sequence of feature maps. The dual-path RNN module processes these feature maps along both the frequency and time dimensions, leveraging the bidirectional LSTM layers to capture temporal and spectral dependencies. This design allows the model to effectively learn and utilize both short-term and long-term correlations in the input data. Finally, the decoder reconstructs the enhanced spectrogram from the processed feature maps. The outcomes demonstrate that DPRNN performs exceptionally well in long-sequence modeling. A UNet-based Wave-U-Net audio separation technique that continuously resamples the feature mapping over a one-dimensional time domain is proposed in the literature [27]. The utilization of this network architecture enhances the model’s understanding of the surrounding context, leading to favorable outcomes. Wavesplit [28] requires the inclusion of prior knowledge during the training phase in order to extract and differentiate characteristics from the input signal source. It then calculates global vectors through clustering operations to represent and distinguish the features of the audio signal. This approach helps to overcome the issue of output sequentiality. SepFormer [29] addresses the issue of RNNs not being able to be parallelized by creating a multi-head attention mechanism, resulting in outstanding performance. However, the aforementioned methods primarily focus on the separation of music or speech signals. Underwater backscatter signals pose distinct difficulties, including their low intensity, the overlapping of multiple similar signals, and the significant interference caused by the excitation signal.

2.2. Underwater Acoustics Signal Separation

Blind source separation is a commonly employed and efficient technique for separating signals. A time-frequency domain source separation technique was proposed in the literature [30]. It processes the short-time Fourier transform amplitude features using a deep bi-directional long short-term memory (BLSTM) recurrent neural network, with the mean square error serving as the cost function. This was the first attempt to apply an ideal amplitude masking objective to hydroacoustic source separation. A network model that combines convolutional and recurrent neural networks was proposed in the literature [31]. A comparison was made between the benefits and drawbacks of separating time and frequency domains. Additionally, a novel approach for separating hydroacoustic signals was developed, which involved the use of a hybrid coding module. The literature [32] proposed an enhanced blind source separation algorithm that was based on non-negative matrix factorization. This algorithm introduced the spatial and spectral correlation of hydroacoustic signals, and it resolved the non-convex correlation and eigencorrelation issues that are typically present in the current non-negative matrix factorization algorithm. A two-stage deep learning-based hydroacoustic signal identification approach was proposed in the literature [33]. The technique employs an encoder module to extract the characteristics of the signal, a separation module to improve the signal components, and a decoder module to reconstruct the sent signals. The experimental findings demonstrate that the model exhibits effective detection of hydroacoustic signals. The literature [34] employed an underwater blind source separation model to elucidate the nonlinear impact of bubble softening on ship noise. An end-to-end network that integrates a BLSTM recurrent neural network and an attention mechanism was proposed to distinguish the radiated noise of the target ship from the nonlinear mixed noise. The simulation findings demonstrated that the scheme exhibits superior performance in mean square error, correlation coefficient, and signal-to-distortion ratio when compared to several neural networks presently employed for both linear and nonlinear blind source separation. The literature [35] has employed a highly randomized impulse signal as a weak-signal acoustic source, utilized random noise and sinusoidal waveforms as background noise, and combined blind source separation and neural network methods to investigate the separation and identification of weak acoustic signals from the acquired observation signals. Underwater backscatter technology is an emerging technology that currently lacks research related to signal separation. This job has its unique challenges, such as weak reflected signal strength, susceptibility to excitation signals and noise signals, multiple reflected signals sharing the same frequency overlapping in the time domain, and sparse useful features. We referred to the above work on acoustic signal separation to solve the unique problem of acoustic backscatter.

3. Problem Description

3.1. Underwater Backscatter Mechanism

Underwater backscatter achieves passive operation based on the piezoelectric effect. The underwater transducer’s piezoelectric effect transforms the mechanical energy of the acoustic resonance into electrical energy. Electrical energy can also be changed into acoustic energy because it is reversible. The underwater backscatter system, comprising an underwater backscatter node, a receiving end, and a transmitting end, is depicted in Figure 2. Piezoelectric material, an impedance matching circuit, an energy harvesting circuit, and a logic control circuit are specifically included in the underwater backscatter node.
Define the excitation signal from the transmitting end of the underwater robot as s(t). It is first produced by the transmitting end and then travels through the water medium. The sound wave deforms the piezoelectric material at the underwater backscatter node, producing electrical energy as a result of the piezoelectric effect. Through the use of a specially designed impedance matching circuit, an energy harvesting circuit maximizes the electrical energy produced by the piezoelectric material. The energy that is captured can be used to power logic control circuits, which can then execute code to complete tasks such as reading sensor values. Additionally, the logic control circuit can control the on/off of an electronic switch based on the information to be delivered in cases where a reflected signal is required to carry information back. This is specifically achieved by managing the node’s change in reflection coefficient. The following equation provides the reflection coefficient:
η = Z W Z L Z W + Z L
In Equation (1), the reflection coefficient is denoted by η , the hydroacoustic impedance is denoted by Z W , and the load impedance of the underwater backscatter node is denoted by Z L . The activation and deactivation of the electronic switch impact the impedance branch of the access circuit. When the electronic switch is in the open state, the reflection coefficient approaches 0, allowing the node to efficiently capture energy. When the electronic switch is closed, the reflection coefficient tends toward 1, and the node reflects the excitation signal [14]. The receiver finalizes the process of receiving and interpreting the accessible data by demodulating the signal, which can include information that is already stored in memory or information that is captured in real-time through sensors.
In Figure 2, r(t) represents the backscatter node signal, and x(t) represents the signal received by the receiving end. It is interesting to note that underwater backscatter nodes reflect the signal omnidirectionally in the opposite phase when they receive the excitation signal from the transmitting end. The receiving end not only receives the reflected signal and ambient noise from the node, but also receives strong interference from the excitation signal of the transmitting end. Our goal is to separate the reflected signal of the underwater backscatter node from the signal received at the receiving end.

3.2. Description of Underwater Backscatter Signal Separation

Underwater backscatter technology enables sensor nodes to transmit ocean sensor information, aiding underwater robots in localization and navigation. Figure 3 depicts the operational environment of the underwater backscatter system, wherein the underwater robot can establish a communication process with the underwater backscatter node. When localization is required, the node can transmit pre-existing internal coordinate information to the underwater robot, using multiple nodes to help the underwater robot determine its own position. Nevertheless, the presence of multiple reflected signals will result in collisions with the robot, thereby posing a challenge for the underwater robot to differentiate between them. This paper concentrates on the issue of separating multiple reflected signals, with a particular emphasis on the separation of two underwater backscatter signals that are subject to the interference of strong excitation signals and noise signals.
When the node transmits the reflection information, the reflection signal generated at the underwater backscatter node is denoted as m(t):
m ( t ) = η b ( t ) s ( t )
where η is the reflection coefficient and b ( t ) is the information to be transmitted.
Thus, the signal r ( t ) transmitted by the backscatter node to the receiving end is denoted as:
r ( t ) = m ( t ) h ( t )
∗ means convolution. Where h ( t ) is the channel response and can be expressed as:
h ( t ) = h s ( t ) h r ( t )
In Equation (4), h s ( t ) is the channel response from the underwater robot to the node, and h r ( t ) is the channel response of the signal reflected from the node to the underwater robot. Each part of the channel response represents the amplitude attenuation and phase delay of the signal during propagation, which directly affects the parameters of the signal.
The signal received by the receiving end is x ( t ) . The received signal includes the excitation signal emitted by the transmitting end, several backscatter node signals, and noise signals, which can be represented by the following equation:
x ( t ) = s ( t ) + k = 1 N r k ( t ) + w ( t )
where N is the number of received backscatter sources, k is the backscatter source index, and w ( t ) is the ambient noise.
The periodic sampling function p ( t ) is defined as:
p ( t ) = + δ ( t n T )
where T is the sampling period and the result of shock sampling on x ( t ) is:
x s ( t ) = x ( t ) p ( t ) = + x ( n T ) δ ( t n T )
For the purpose of visualizing the frequency changes that occur in the time-domain signal, the time-frequency transform (STFT) is a method of time-frequency analysis that was proposed on the basis of the Fourier transform [36,37]. Specifically, the operation consists of dividing the time-domain signal into various parts using the window function, then performing operations on each of those parts in order to obtain the frequency components that correspond to those parts, and finally stitching the results together to form a complete spectrogram. The STFT can be expressed using the equation that is presented below:
X ( τ , f ) = + x s w ( t ) z ( t τ ) e j 2 π f t d t
where x s w ( t ) denotes the signal truncated by the window function, z ( t ) denotes the window function, f denotes the signal frequency, t denotes the time, and τ denotes the time-domain position.
The primary aim of this paper is to analyze the characteristics of the signals received at the receiving end and to distinguish the underwater backscatter node signals by eliminating the impact of strong interference and noise from the excitation signal using the source separation technique.

4. Model Construction

As shown in Figure 4, the TF-REF-RNN network adopts the classic encoder, separator, and decoder architecture in the field of audio deep learning. The encoder is utilized to convert the mixed signals into distinct characteristics, and these transformed time-frequency domain features are then inputted into the separator to separate the backscatter signals according to their respective features. We introduce an underwater backscatter reference signal into the separator component to enhance the network’s focus on separating the underwater backscatter signal. Ultimately, the decoder reconstructs the clean backscatter signal independently. Further information regarding the encoder, separator, and decoder will be provided in the following section.

4.1. Encoder

In the task of underwater backscatter signal separation, the encoder can extract the features of each source in the mixed signal and convert them into high-dimensional embeddings. The details of the encoder proposed in this paper are as follows. The input mixed waveforms are respectively encoded by a 1D convolutional branch and a STFT branch for feature extraction, and finally concatenated together and fed into the separator. The primary aim of the 1D convolutional branch is to increase the number of channels in the time-domain waveform using a single convolutional layer. The equation that represents it is as follows:
U = C ( x ( t ) )
where U stands for the mixing feature in high dimensions and C represents the 1D convolution operation.
In the STFT branch, the mixed waveform is initially transformed into a spectrogram. Then, a convolutional layer is used to increase the number of channels. The equation that represents it is as follows:
E = C ( X ( τ , f ) )
where E denotes the high-dimensional features after STFT.
As shown in Figure 5, the high-dimensional features undergo average pooling, full connectivity, activation, and sigmoid for further feature extraction. In this branch, the neural network extracts more features from the signal at the frequency of interest, eliminating the influence of irrelevant frequencies. The two branches are concatenated, and the number of channels is modified by a convolutional layer.

4.2. Underwater Backscatter Signal Separator

The underwater backscatter signal separator, capable of effectively separating the features of different backscatter nodes after receiving the high-dimensional embedding from the encoder, is depicted in Figure 6. This figure illustrates the structure of the proposed separator.
In this paper, we segment the feature output from the encoder into overlapping blocks and connect all of the blocks collectively into a three-dimensional tensor. DPRNN blocks are mainly used in the separator to complete the separation [26]. By incorporating an additional reference signal feature branch into the separator, we are able to assist the network in concentrating more on the underwater backscatter signal while also preventing a strong excitation signal and noise from competing for attention. The reference signal is an underwater backscatter signal that is arbitrarily pure, and we extract its features by employing a 1D convolution and STFT, respectively.
We then concatenate these features with segmentation, pass them through the BLSTM layer, and sum the normalized result with the encoder’s output. When the separator is operating, it only inputs the feature information of the reference signal once. Its purpose is to introduce the template information of the underwater backscatter signal. Simultaneously, it sets the initial value for the DPRNN block, which speeds up the network’s convergence and enhances its performance in separating the underwater backscatter signal.
The mixed input completes intra-chunk and inter-chunk recurrent neural network modeling alternately within several DPRNN blocks and finally sends the output to the decoder.

4.3. Decoder

The decoder can resynthesize the time-domain signals based on the output characteristics of the underwater backscatter signal separator for each source. Through the use of overlapping summation, the final output of the separator can be transformed from a three-dimensional tensor into an ordered chronological sequence. During the process of decoding, the decoder employs a 1D convolution to reduce the dimensionality of the features from a high-dimensional space to a 2D signal. The signal that has been separated is then transformed back to its time-domain waveform in its original state.

5. Experimental Procedures

5.1. Simulation Parameter Settings

In this paper, we use real hydroacoustic noise data combined with simulated backscatter signals to produce a dataset to complete the training, validation, and testing of the network. Each dataset consists of a backscatter signal and an interference signal, where the interference signal consists of an excitation signal s(t) and underwater ambient noise w(t). The backscatter signal r(t) is generated based on the underwater backscatter mechanism.
BS-2-mix is an underwater backscatter signal separation dataset that is constructed using simulated signals and the shipsEar dataset. In BS-2-mix, there are reflected signals from two nodes, excitation signals s(t), and underwater ambient noise signals w(t). When simulating signals, the spatial positions of sensors and sources are considered, and the locations of the backscatter nodes are randomly distributed within 10 and 15 m from the receiving end. The throughput of the nodes is set to 1034 bit per second, the modulation is on–off keying (OOK), and the signal length is set to 0.5 s. We refer to relevant research on underwater acoustic channels and set the amplitude variation to attenuate by 20 dB every 30 milliseconds [38]. For the time delay generated during the signal round-trip, we strictly calculate based on the underwater acoustic velocity and working distance. Additionally, Bi-Phase Space Coding leverages both the time and frequency domains to represent audio signals, enabling the efficient processing and separation of complex mixtures. A more detailed dataset setup can be seen in Table 1.
This paper utilizes the underwater ambient data obtained from shipsEar to produce underwater ambient noise signals w(t). ShipsEar is a dataset of hydroacoustic signals that includes the sounds produced by 11 ship engines as well as underwater ambient noise. These signals are acquired through the use of hydrophones. The position of the hydrophones during each recording is intended to optimize the acquisition of the sound of interest and to mitigate other noise.
This paper randomly cropped the underwater ambient noise in the shipsEar dataset to obtain a sound slice with a length of 0.5 s and set a random signal-to-noise ratio of −5 dB to 0 dB in the form of additive noise. For detailed noise data, please refer to Table 2. Approximately 27,000 pieces of data are generated through the utilization of both the underwater ambient noise and the simulated underwater backscatter signals. The sampling frequency of the data is 52,734 Hz, and the training set, validation set, and test set are each divided in a ratio of 7:2:1, respectively.

5.2. Model Configuration

We set the number of filters within the encoder to 64 and use 6 DPRNN blocks for signal separation. The hidden unit is set to 128, and the intra-chunk and inter-chunk containing BLSTM [39] are cross-used in the DPRNN blocks, respectively. The layernorm is chosen as the normalization method, and the number of underwater backscatter signals is set to 2. The decoder-related settings of the model are the same as the corresponding settings within the encoder.
For the STFT branch, NFFT is set to 1024, and hop length is set to 520. The size of the number of channels in the fully connected layer after the average pooling layer is set to 256, and the reduction is set to 16. The activation function uses ReLU, and finally, the importance coefficients of the individual channels are output using a sigmoid. The reference signal feature extraction process uses an encoder to perform a similar operation. It then uses a BLSTM with 128 hidden units and layer normalization to finish the template extraction and information injection.

5.3. Experiment Configuration

We trained 51 epochs on the BS-2-mix dataset with an initial learning rate of 5 × 10−4. The factor is 0.5 for every two epochs. Early stopping is applied if no best model is found in the validation set for 10 consecutive epochs. The optimizer uses Adam with gradient cropping with a maximum L2 norm of 5. All models are trained with utterance-level per-mutation invariant training (uPIT) [40]. The effectiveness of the model is measured by the source-to-distortion ratio improvement (SDRi) and scale-invariant signal-to-noise ratio improvement (SI-SNRi).

5.4. Indicators for Model Assessment

In this paper, we use the SI-SNR [41] as the loss function, being defined as the following equation:
s t a r g e t : = s ^ , s s s 2 e n o i s e : = s ^ s t a r g e t S I S N R : = 10 log 10 s t a r g e t 2 e n o i s e 2
where s is the pure signal and s ^ is the estimated result, which are analyzed separately, , is the dot product of two vectors, and 2 denotes the L2 paradigm. Also in this paper, permutation invariant training is used to solve the order problem of the output separation of sources.

6. Results

6.1. Results on BS-2-Mix

We launch our tests on the BS-2-mix dataset mentioned in Section 5.1. Table 3 compares the SI-SNRi and SDRi performance of the TR-REF-RNN network with the other methods on our dataset, and it can be seen that the TF-REF-RNN network obtains the best performance with a result of 28.55 dB on SI-SNRi and 19.51 dB on SDRi. This is an improvement of 2.82 dB and 2.49 dB, respectively, compared to the DPRNN network before improvement.
At this stage, the underwater backscatter signal r(t) has been successfully separated from the signal x(t) received at the receiving end, and the excitation signal s(t) and environmental noise signal w(t) have been removed. The signal r(t), as shown in Figure 2, contains both reflected and non-reflected signals, corresponding to the two states of the backscatter working mechanism, namely the reflected and non-reflected states. To enhance the decoding of underwater backscatter signals, it is crucial to clearly distinguish between the reflected and non-reflected states. In the non-reflected state, the signal energy should be close to 0, and in the reflected state, the energy of the reflected signal, for a short period of time, should be significantly different from that of the non-reflected state and remain consistent.
To better analyze the separation effect of underwater backscatter signals, time-domain waveforms are shown in Figure 7. It shows a randomly selected mixed signal from the test set, which contains multiple components of the excitation interference signal, the reflection signal generated from two different nodes, and noise. Figure 8 and Figure 9 show fragments of the separation effect of TF-REF-RNN, DPRNN, and Conv-TasNet on underwater reflection signals 1 and 2, respectively.
Differences in the results of various models analyzing the same mixed signals were observed at approximately 0.29 s in each panel of Figure 8, and at 0.21 and 0.27 s in each panel of Figure 9. Based on the findings, it can be concluded that the TF-REF-RNN network model that was proposed in this paper possesses a superior separation effect, the ability to differentiate between the re-flection state and the non-reflection state, and superior noise resistance.

6.2. Ablation Studies

We conducted ablation experiments using the TF-REF-RNN model and the BS-2-mix dataset. Table 4 illustrates the impact of incorporating the STFT branch encoder and the reference signal into the proposed underwater backscatter separator.
An increase of 1.87 dB in SI-SNRi and 1.54 dB in SDRi is brought about as a result of the implementation of the STFT branch. This is due to the fact that the backscatter signal operates at its own frequency, and the network is able to concentrate more on the resonance frequency at which the underwater backscatter node operates, thereby avoiding the interference caused by low-frequency noise. Furthermore, the incorporation of time-frequency data enables the network to allocate greater focus towards the various harmonics resulting from the switching of the backscatter state, thereby enhancing the discrimination of signals with identical frequencies. After the introduction of the global features of the reference signal, the SI-SNRi and SDRi are again improved by 0.95 dB, respectively, and the newly introduced features allow the network to better distinguish the excitation signal from the backscatter signal.
An additional benefit of the reference signal feature is that it provides initial information for the RNN separation module that comes after it, which enables the network to rapidly converge. For the purposes of the network model, the reference signal feature acts as a template for representing universal information. It gives the network the ability to comprehend the pattern of the backscatter signal’s on–off keying and effectively differentiate between signals that are similar to one another without failure. A high performance of 28.55 dB SI-SNRi and 19.51 dB SDRi is achieved by the network as a consequence of both of these factors.

7. Conclusions

In this paper, we first construct a signal reception model for multiple backscatter nodes based on the working principle of backscatter. The TF-REF-RNN network model is proposed for the backscatter signal separation of underwater backscatter nodes. An STFT-based encoder introducing time-frequency information is designed in the feature extraction stage to allow the network to focus on the node’s operating frequency. A separator that introduces a global feature representation of the reference signal is proposed in the signal separation stage. To validate the model’s performance, we produced a BS-2-mix dataset based on shipsEar noise data and underwater simulated backscatter signals and performed test and ablation experiments on it. The experimental results show that the TF-REF-RNN model obtains a better performance of 28.55 dB SI-SNRi and 19.51 dB SDRi compared with similar classical network models.

Author Contributions

Conceptualization, J.L. and S.G.; methodology, S.G.; software, S.G.; validation, S.G., J.L., T.Z. and Z.Z.; formal analysis, S.G.; investigation, S.G., H.D. and J.T.; resources, S.G. and J.L.; data curation, S.G.; writing—original draft preparation, S.G.; writing—review and editing, S.G.; visualization, S.G.; supervision, J.L.; funding acquisition, J.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded in part by the Joint Funds of the National Natural Science Foundation of China under Grant U22A2009 and in part by the National Key Research and Development Program under Grant 2021YFC2803000.

Data Availability Statement

The dataset provided in this article is not easily accessible due to policy restrictions.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Ali, M.F.; Jayakody, D.N.K.; Chursin, Y.A.; Affes, S.; Dmitry, S. Recent Advances and Future Directions on Underwater Wireless Communications. Arch. Comput. Methods Eng. 2020, 27, 1379–1412. [Google Scholar] [CrossRef]
  2. Luo, J.; Yang, Y.; Wang, Z.; Chen, Y. Localization Algorithm for Underwater Sensor Network: A Review. IEEE Internet Things J. 2021, 8, 13126–13144. [Google Scholar] [CrossRef]
  3. Su, X.; Ullah, I.; Liu, X.; Choi, D. A Review of Underwater Localization Techniques, Algorithms, and Challenges. J. Sens. 2020, 2020, 6403161. [Google Scholar] [CrossRef]
  4. Jiang, T.; Zhang, Y.; Ma, W.; Peng, M.; Peng, Y.; Feng, M.; Liu, G. Backscatter Communication Meets Practical Battery-Free Internet of Things: A Survey and Outlook. IEEE Commun. Surv. Tutor. 2023, 25, 2021–2051. [Google Scholar] [CrossRef]
  5. Jia, M.; Yao, C.; Liu, W.; Ye, R.; Juhana, T.; Ai, B. Sensitivity and Distance Based Performance Analysis for Batteryless Tags with Transmit Beamforming and Ambient Backscattering. China Commun. 2022, 19, 109–117. [Google Scholar] [CrossRef]
  6. Galisteo, A.; Varshney, A.; Giustiniano, D. Two to Tango: Hybrid Light and Backscatter Networks for next Billion Devices. In Proceedings of the 18th International Conference on Mobile Systems, Applications, and Services, Toronto, ON, Canada, 15–19 June 2020; Association for Computing Machinery: New York, NY, USA, 2020; pp. 80–93. [Google Scholar]
  7. Yao, C.; Liu, Y.; Wei, X.; Wang, G.; Gao, F. Backscatter Technologies and the Future of Internet of Things: Challenges and Opportunities. Intell. Converg. Netw. 2020, 1, 170–180. [Google Scholar] [CrossRef]
  8. Liu, V.; Parks, A.; Talla, V.; Gollakota, S.; Wetherall, D.; Smith, J.R. Ambient backscatter: Wireless communication out of thin air. ACM SIGCOMM Comput. Commun. Rev. 2013, 43, 39–50. [Google Scholar] [CrossRef]
  9. Wang, Y.; Yan, S.; Yang, W.; Huang, Y.; Liu, C. Energy-Efficient Covert Communications for Bistatic Backscatter Systems. IEEE Trans. Veh. Technol. 2021, 70, 2906–2911. [Google Scholar] [CrossRef]
  10. Rezaei, F.; Galappaththige, D.; Tellambura, C.; Herath, S. Coding Techniques for Backscatter Communications—A Contemporary Survey. IEEE Commun. Surv. Tutor. 2023, 25, 1020–1058. [Google Scholar] [CrossRef]
  11. Guo, X.; Shangguan, L.; He, Y.; Jing, N.; Zhang, J.; Jiang, H.; Liu, Y. Saiyan: Design and Implementation of a Low-Power Demodulator for {LoRa} Backscatter Systems. In Proceedings of the 19th USENIX Symposium on Networked Systems Design and Implementation, Renton, WA, USA, 4–6 April 2022; pp. 437–451. [Google Scholar]
  12. Khan, W.U.; Nguyen, T.N.; Jameel, F.; Jamshed, M.A.; Pervaiz, H.; Javed, M.A.; Jäntti, R. Learning-Based Resource Allocation for Backscatter-Aided Vehicular Networks. IEEE Trans. Intell. Transp. Syst. 2022, 23, 19676–19690. [Google Scholar] [CrossRef]
  13. Quattrini Li, A.; Carver, C.J.; Shao, Q.; Zhou, X.; Nelakuditi, S. Communication for Underwater Robots: Recent Trends. Curr. Robot. Rep. 2023, 4, 13–22. [Google Scholar] [CrossRef]
  14. Jang, J.; Adib, F. Underwater Backscatter Networking. In Proceedings of the ACM Special Interest Group on Data Communication, Beijing, China, 19–23 August 2019; Association for Computing Machinery: New York, NY, USA, 2019; pp. 187–199. [Google Scholar]
  15. Bereketli, A. Interference-free source deployment for coverage in underwater acoustic backscatter networks. Peer-to-Peer Netw. Appl. 2022, 15, 1577–1594. [Google Scholar] [CrossRef]
  16. Zhao, Y.; Afzal, S.S.; Akbar, W.; Rodriguez, O.; Mo, F.; Boyle, D.; Adib, F.; Haddadi, H. Towards Battery-Free Machine Learning and Inference in Underwater Environments. In Proceedings of the 23rd Annual International Workshop on Mobile Computing Systems and Applications, Tempe, AZ, USA, 9–10 March 2022; Association for Computing Machinery: New York, NY, USA, 2022; pp. 29–34. [Google Scholar]
  17. Afzal, S.S.; Akbar, W.; Rodriguez, O.; Doumet, M.; Ha, U.; Ghaffarivardavagh, R.; Adib, F. Battery-Free Wireless Imaging of Underwater Environments. Nat. Commun. 2022, 13, 5546. [Google Scholar] [CrossRef] [PubMed]
  18. Hu, G.; Lin, J.; Wang, G.; He, R.; Wei, X. Sparse Reconstruction Based Channel Estimation for Underwater Piezo-Acoustic Backscatter Systems. In Proceedings of the 2021 IEEE 93rd Vehicular Technology Conference (VTC2021-Spring), Helsinki, Finland, 25–28 April 2021; pp. 1–5. [Google Scholar]
  19. Akbar, W.; Allam, A.; Adib, F. The Underwater Backscatter Channel: Theory, Link Budget, and Experimental Validation. In Proceedings of the 29th Annual International Conference on Mobile Computing and Networking, Madrid, Spain, 2–6 October 2023; Association for Computing Machinery: New York, NY, USA, 2023; pp. 1–15, ISBN 978-1-4503-9990-6. [Google Scholar]
  20. Ghaffarivardavagh, R.; Afzal, S.S.; Rodriguez, O.; Adib, F. Underwater Backscatter Localization: Toward a Battery-Free Underwater GPS. In Proceedings of the 19th ACM Workshop on Hot Topics in Networks, Virtual, 4–6 November 2020; Association for Computing Machinery: New York, NY, USA, 2020; pp. 125–131. [Google Scholar]
  21. Zhang, L.; Wang, Z.; Zhang, H.; Min, M.; Wang, C.; Niyato, D.; Han, Z. Anti-Jamming Colonel Blotto Game for Underwater Acoustic Backscatter Communication. IEEE Trans. Veh. Technol. 2024, 73, 10181–10195. [Google Scholar] [CrossRef]
  22. Yin, F.; Li, C.; Wang, H.; Nie, L.; Zhang, Y.; Liu, C.; Yang, F. Weak Underwater Acoustic Target Detection and Enhancement with BM-SEED Algorithm. J. Mar. Sci. Eng. 2023, 11, 357. [Google Scholar] [CrossRef]
  23. Santos-Domínguez, D.; Torres-Guijarro, S.; Cardenal-López, A.; Pena-Gimenez, A. ShipsEar: An underwater vessel noise database. Appl. Acoust. 2016, 113, 64–69. [Google Scholar] [CrossRef]
  24. Hershey, J.R.; Chen, Z.; Le Roux, J.; Watanabe, S. Deep clustering: Discriminative embeddings for segmentation and separation. In Proceedings of the 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Shanghai, China, 20–25 March 2016; pp. 31–35. [Google Scholar]
  25. Luo, Y.; Mesgarani, N. Conv-TasNet: Surpassing Ideal Time–Frequency Magnitude Masking for Speech Separation. IEEE/ACM Trans. Audio Speech Lang. Process. 2019, 27, 1256–1266. [Google Scholar] [CrossRef] [PubMed]
  26. Luo, Y.; Chen, Z.; Yoshioka, T. Dual-Path RNN: Efficient Long Sequence Modeling for Time-Domain Single-Channel Speech Separation. In Proceedings of the ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 4–8 May 2020; pp. 46–50. [Google Scholar]
  27. Stoller, D.; Ewert, S.; Dixon, S. Wave-U-Net: A Multi-Scale Neural Network for End-to-End Audio Source Separation. arXiv 2018, arXiv:1806.03185. [Google Scholar]
  28. Zeghidour, N.; Grangier, D. Wavesplit: End-to-End Speech Separation by Speaker Clustering. IEEE/ACM Trans. Audio Speech Lang. Proc. 2021, 29, 2840–2849. [Google Scholar] [CrossRef]
  29. Subakan, C.; Ravanelli, M.; Cornell, S.; Bronzi, M.; Zhong, J. Attention Is All You Need in Speech Separation. In Proceedings of the ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, ON, Canada, 6–11 June 2021; pp. 21–25. [Google Scholar]
  30. Zhang, W.; Li, X.; Zhou, A.; Ren, K.; Song, J. Underwater Acoustic Source Separation with Deep Bi-LSTM Networks. In Proceedings of the 2021 4th International Conference on Information Communication and Signal Processing (ICICSP), Shanghai, China, 24 September 2021; pp. 254–258. [Google Scholar]
  31. Shi, Z.; Wang, K. Separation of Underwater Acoustic Signals Based on C-RNN Network. In Proceedings of the International Conference on Electronic Information Engineering, Big Data, and Computer Technology (EIBDCT 2022), Sanya, China, 20–22 January 2022; Volume 12256, pp. 35–40. [Google Scholar]
  32. Li, D.; Wu, M.; Yu, L.; Han, J.; Zhang, H. Single-Channel Blind Source Separation of Underwater Acoustic Signals Using Improved NMF and FastICA. Front. Mar. Sci. 2023, 9, 1097003. [Google Scholar] [CrossRef]
  33. Chu, H.; Li, C.; Wang, H.; Wang, J.; Tai, Y.; Zhang, Y.; Yang, F.; Benezeth, Y. A Deep-Learning Based High-Gain Method for Underwater Acoustic Signal Detection in Intensity Fluctuation Environments. Appl. Acoust. 2023, 211, 109513. [Google Scholar] [CrossRef]
  34. Song, R.; Feng, X.; Wang, J.; Sun, H.; Zhou, M.; Esmaiel, H. Underwater Acoustic Nonlinear Blind Ship Noise Separation Using Recurrent Attention Neural Networks. Remote Sens. 2024, 16, 653. [Google Scholar] [CrossRef]
  35. Liu, S.; Gao, J.; Zhou, H.; Yang, K.; Liu, P.; Du, Y. Study on Weak Sound Signal Separation and Pattern Recognition under Strong Background Noise in Marine Engineering. J. Low Freq. Noise Vib. Act. Control. 2024, 43, 595–608. [Google Scholar] [CrossRef]
  36. Chen, Z.; Luo, Y.; Mesgarani, N. Deep Attractor Network for Single-Microphone Speaker Separation. In Proceedings of the 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), New Orleans, LA, USA, 5–9 March 2017; pp. 246–250. [Google Scholar] [CrossRef]
  37. Isik, Y.; Roux, J.L.; Chen, Z.; Watanabe, S.; Hershey, J.R. Single-Channel Multi-Speaker Separation Using Deep Clustering. Interspeech 2016, 2016, 545–549. [Google Scholar] [CrossRef]
  38. Berger, C.R.; Zhou, S.; Preisig, J.C.; Willett, P. Sparse Channel Estimation for Multicarrier Underwater Acoustic Communication: From Subspace Methods to Compressed Sensing. IEEE Trans. Signal Process. 2010, 58, 1708–1721. [Google Scholar] [CrossRef]
  39. Graves, A.; Schmidhuber, J. Framewise Phoneme Classification with Bidirectional LSTM and Other Neural Network Architectures. Neural Netw. 2005, 18, 602–610. [Google Scholar] [CrossRef] [PubMed]
  40. Kolbaek, M.; Yu, D.; Tan, Z.-H.; Jensen, J. Multitalker Speech Separation with Utterance-Level Permutation Invariant Training of Deep Recurrent Neural Networks. IEEE/ACM Trans. Audio Speech Lang. Process. 2017, 25, 1901–1913. [Google Scholar] [CrossRef]
  41. Roux, J.L.; Wisdom, S.; Erdogan, H.; Hershey, J.R. SDR—Half-Baked or Well Done? In Proceedings of the ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK, 12–17 May 2019; pp. 626–630. [Google Scholar]
  42. Luo, Y.; Mesgarani, N. TaSNet: Time-Domain Audio Separation Network for Real-Time, Single-Channel Speech Separation. In Proceedings of the 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, AB, Canada, 15–20 April 2018; pp. 696–700. [Google Scholar]
Figure 1. Architecture of integrated space–air–ground–ocean network.
Figure 1. Architecture of integrated space–air–ground–ocean network.
Remotesensing 16 03635 g001
Figure 2. Mechanism of underwater backscatter node operation.
Figure 2. Mechanism of underwater backscatter node operation.
Remotesensing 16 03635 g002
Figure 3. Underwater backscatter communication and localization system.
Figure 3. Underwater backscatter communication and localization system.
Remotesensing 16 03635 g003
Figure 4. The network structure of TF-REF-RNN.
Figure 4. The network structure of TF-REF-RNN.
Remotesensing 16 03635 g004
Figure 5. The architecture of the proposed encoder.
Figure 5. The architecture of the proposed encoder.
Remotesensing 16 03635 g005
Figure 6. The architecture of the proposed underwater backscatter signal separator.
Figure 6. The architecture of the proposed underwater backscatter signal separator.
Remotesensing 16 03635 g006
Figure 7. A mixed signal for underwater backscatter signal separation.
Figure 7. A mixed signal for underwater backscatter signal separation.
Remotesensing 16 03635 g007
Figure 8. Fragments of the effect of different network models on the separation of the backscatter signal 1: (a) Clean signal; (b) TF-REF-RNN; (c) DPRNN; and (d) Conv-TasNet.
Figure 8. Fragments of the effect of different network models on the separation of the backscatter signal 1: (a) Clean signal; (b) TF-REF-RNN; (c) DPRNN; and (d) Conv-TasNet.
Remotesensing 16 03635 g008
Figure 9. Fragments of the effect of different network models on the separation of the backscatter signal 2: (a) Clean signal; (b) TF-REF-RNN; (c) DPRNN; and (d) Conv-TasNet.
Figure 9. Fragments of the effect of different network models on the separation of the backscatter signal 2: (a) Clean signal; (b) TF-REF-RNN; (c) DPRNN; and (d) Conv-TasNet.
Remotesensing 16 03635 g009
Table 1. Dataset parameters.
Table 1. Dataset parameters.
ParameterValue
Amount of data27,000
Working distance (m)(10, 15)
Center frequency (Hz)15,000
Sampling frequency (Hz)52,734
Signal duration (s)0.5
Reflection signal duration (s)0.4
Throughput (bit/s)1034
Bit length414
EncodingsBi-Phase Space Coding
Signal-to-noise ratio (dB)(−5, 0)
Table 2. Noise data in shipsEar.
Table 2. Noise data in shipsEar.
FilenameNameDuration
85__E__1LNatural ambient noise sample 185 s
86__E__2MNatural ambient noise sample 299 s
87__E__3HNatural ambient noise sample 398 s
88__E__4LNatural ambient noise sample 493 s
89__E__5MNatural ambient noise sample 593 s
90__E__6HNatural ambient noise sample 691 s
91__E__7H_NNatural ambient noise sample 734 s
92__E__8H_NNatural ambient noise sample 867 s
Table 3. Performance comparisons on the BS-2-mix.
Table 3. Performance comparisons on the BS-2-mix.
ModelSI-SNRi (dB)SDRi (dB)
TasNet [42]14.8510.98
Conv-TasNet [25]25.2416.55
DPRNN [26]25.7317.02
TF-REF-RNN28.5519.51
Table 4. Results of ablation experiment.
Table 4. Results of ablation experiment.
ModelSI-SNRi (dB)SDRi (dB)
DPRNN25.7317.02
Encoder with STFT branch27.618.56
Underwater backscatter separator28.5519.51
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Liu, J.; Gong, S.; Zhang, T.; Zhao, Z.; Dong, H.; Tan, J. TF-REF-RNN: Time-Frequency and Reference Signal Feature Fusion Recurrent Neural Network for Underwater Backscatter Signal Separation. Remote Sens. 2024, 16, 3635. https://doi.org/10.3390/rs16193635

AMA Style

Liu J, Gong S, Zhang T, Zhao Z, Dong H, Tan J. TF-REF-RNN: Time-Frequency and Reference Signal Feature Fusion Recurrent Neural Network for Underwater Backscatter Signal Separation. Remote Sensing. 2024; 16(19):3635. https://doi.org/10.3390/rs16193635

Chicago/Turabian Style

Liu, Jun, Shenghua Gong, Tong Zhang, Zhenxiang Zhao, Hao Dong, and Jie Tan. 2024. "TF-REF-RNN: Time-Frequency and Reference Signal Feature Fusion Recurrent Neural Network for Underwater Backscatter Signal Separation" Remote Sensing 16, no. 19: 3635. https://doi.org/10.3390/rs16193635

APA Style

Liu, J., Gong, S., Zhang, T., Zhao, Z., Dong, H., & Tan, J. (2024). TF-REF-RNN: Time-Frequency and Reference Signal Feature Fusion Recurrent Neural Network for Underwater Backscatter Signal Separation. Remote Sensing, 16(19), 3635. https://doi.org/10.3390/rs16193635

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop