1. Introduction
Vertical underwater acoustic (UWA) communications are critical in deep-sea activities such as scientific exploration with human-occupied vehicles. The communication waveforms are severely distorted with Doppler dilation or compression induced by the platform movement. In practical communication systems [
1,
2,
3,
4,
5], the Doppler is usually coarsely estimated using synchronization signals, and then the residual Doppler is tracked by means of the equalizer, with the aid of pilot symbols or decided information symbols. Improving the accuracy of time-varying Doppler synchronization relaxes the burden on the adaptive equalizer and thus prevents the divergence of the adaptive algorithm in response to an impulsive noise. Among the motion sources, the fluctuation of the surface platform is the hardest to estimate because of its randomness. The ocean surface height was modeled as a sinusoid with a period of 8 s in [
6], where the time difference between the transmitter and receiver was estimated over 0.5 s with the constant acceleration model. According to the experimental measurements [
7] of the channel impulsive response (CIR), the surface motion was a narrow-band random process. A more accurate surface height model was described by the Pierson–Moskowitz (PM) wave height spectrum [
8], which was utilized in the analysis of the surface acoustic reflection [
9]. However, the PM model has not been involved in the Doppler estimation of UWA communications.
Doppler synchronization for UWA communications consists primarily of two steps to accomplish the coarse and fine estimations, respectively [
10,
11]. In the first step, the time-invariant motion parameters, such as the timing offsets (displacement), the Doppler scaler (speed), and the Doppler rate (acceleration), are estimated through correlation with the synchronization signal. The correlation can be a single-branch one for Doppler-insensitive signals or a multiple-branch one [
12] for Doppler-sensitive signals. The single-branch correlation has low complexity, and the estimation results are biased according to the delay–Doppler ambiguity function. The multiple-branch correlation is unbiased when enough branches are produced; however, this leads to extreme complexity in calculations. In the second step, where the continuous movement is tracked, different methods are carried out according to the modulation structure, such as using an adaptive equalizer combined with a phase-locked loop (PLL) for single-carrier (SC) systems [
1] or using intercarrier inference (ICI) cancellation for orthogonal frequency division multiplexing (OFDM) systems. This paper proposes that the motion trajectory can be estimated without the tracking specified above if the relationship among the sequential synchronization results is fully utilized.
Deep learning was initially developed in the areas of computer vision and natural language processing, where mathematical description was difficult. Recently, deep learning has been prominent in wireless communications and UWA communications. The deep neural network (DNN)-based OFDM receiver proposed in [
13,
14] was shown to be more robust than conventional methods; this model was extended to deal with the time-varying underwater acoustic channel and was verified by simulations [
7]. A DNN-based online-training UWA receiver [
15] was trained by the adaptive moment estimation (Adam) optimizer for each sub-block for the time-varying UWA channel and was verified by sea trial data. To reduce the training burden, model-driven deep learning [
16] used expert knowledge to make the network explainable and predictable. In [
17], a DNN-based channel estimation for online UWA communications was proposed that had better simulation performance than the minimum mean squared error (MMSE) algorithm and also had less run time and storage resource consumption. In [
18], a convolutional neural network was introduced to fix errors in symbol timing and carrier offset estimation in UWA communications under the condition of flat frequency response. The main difficulty of UWA communications is the high Doppler effect; we anticipate that this can be addressed by the combination of deep learning and expert knowledge to reduce training requirements.
In this paper, a novel receiver structure for underwater vertical acoustic communication is proposed, where the correlation-based estimation bias of the timing offset is learned and then estimated by the DNN to an accuracy that renders the real-time adjustment of the subsequent equalizer unnecessary. The expert knowledge of the ambiguity function of linear frequency modulation (LFM) was utilized in the design of the DNN model to reduce the parameter size for tuning. The model was based on the PM random surface height model with a moderate wind speed and was verified to be applicable for various wind speeds and experimental waveforms. This receiver, embedded with the DNN model, demonstrated lower complexity and better performance than the adaptive equalizer-based receiver. This paper’s structure is as follows:
Section 2 presents the channel model and the transmission packet structure.
Section 3 describes the DNN model for synchronization and the full receiver used in the low-complexity baseband processing.
Section 4 shows the results of the simulations and deep-sea experiments.
Section 5 presents the study’s conclusions.
2. Transmission Packet Structure and Channel Model
To reduce the synchronization complexity, we used multiple equispaced LFM signals in one packet, which is also widely used in UWA communications [
19,
20]. The LFM signal in the analytic signal form is written as:
where
and
are the start and end frequencies, respectively, and
is the duration of the synchronization signal. A whole packet consists of
LFMs, with the occurrence period of
and the transmitted synchronization given by:
The data frames are transmitted among the synchronization signals.
The acoustic channel was modeled as the cascade of the time-invariant linear filter and the motion-induced time-varying Doppler compression or dilation. The CIR is represented by
, and its convolution with
is written as:
The received waveform in the passband is then given by:
where
is the time-varying distance between the transmitter and the receiver,
is the speed of the sound, and
is the additive noise. During the vertical communication between the submersible and the mother ship, the position of the mother ship rapidly fluctuates with the surface wave height. Therefore, in this scenario, the displacement is modeled by the time-varying surface wave height. The Pierson–Moskowitz model was adopted, whose spectrum is given in [
8] as:
where
is the wind speed, and the other constant parameters take the following values:
,
, and
.
4. Simulation and Experimental Results
The start and end frequencies of the LFM synchronization signal were
and
, respectively, and its duration was
. Therefore, according to (8), the ratio of the bias delay to the Doppler was
. The duration of the quadrature phase shift keying (QPSK) symbol was
and the passband and baseband sample intervals were
and
, respectively. For example, with a Doppler scaler
, the delay bias after the correlation was
, which caused a symbol phase rotation of
, and therefore the adaption of the equalizer was necessary if the bias was uncompensated. In the packet, there were
frames whose individual duration was
, and the total number of the LFM signals was
, which was equal to the sizes for both the input and output of the DNN model.
Table 1 outlines the hyper-parameters of the DNN model. The model was trained in the simulation data generated in accordance with
Figure 1, with a constant wind speed of 15 m/s. The number of training samples was 100,000, with the epochs being 50. The training was completed in 56 s on a laptop computer with an Intel i7-8750 CPU at a 2.2 GHz clock frequency.
The bias was estimated by the trained DNN model and eliminated in the delay estimation. The unbiased positions of the LFMs were interpolated by the cubic spline function to obtain the timing offset for the whole packet. For comparison, the unbiased positions were interpolated by the linear function in the simulations. The additional positions used include biased positions directly given by the correlation and perfect positions. The number of the validation samples was 2000 for each wind speed condition, and the wind speed in the test varied from 5 to 25 m/s. The root-mean-square errors (RMSEs) of the timing offsets for the whole packet were compared in the simulations, as shown in
Figure 4. It should be noted that we first investigated the performance of the DNN models trained at different wind speeds to select the appropriate DNN model, and the results are shown in
Figure 4 for comparison. It can be concluded that the model trained at a specific wind speed can cope with all test scenarios at lower wind speeds and some test scenarios at higher wind speeds within a certain range, since the spectral component of high wind speed can cover the spectral component of low wind speed. However, for extremely high wind speeds, the sample distribution space is wider, and more training samples are needed to avoid overfitting. Based on the above factors, in the following analysis, the network model trained at a wind speed of 15 m/s was adopted to process the test data. We then focused on the performance comparison of the pairwise combination of the three LFM position estimation methods and the two interpolation methods. The biased LFM synchronization performed the worst for both interpolation methods, and because of the bias causing RMSE degradation, spline interpolation did not improve the performance as compared to linear interpolation. After the bias was eliminated through the DNN model, it performed nearly the same as the perfect LFM synchronizations under the condition of either interpolation method, and specifically, spline interpolation could obtain a half-RMSE of linear interpolation. For the proposed scheme of the DNN model and spline interpolation, the RSME was nearly 0.01 ms in various wind speeds, which means a phase rotation error of
. The RMSE floor was mainly caused by the high-frequency components of the surface motion and the limited repetition frequency of the LFMs. Increasing the LFMs for the same packet duration suppressed the RMSE but sacrificed the transmission bandwidth.
The computational complexities of the main functional models for one packet are depicted in
Table 2. The total complexity for the proposed scheme was approximately six million multiply–accumulate operations (MACs), which is 24% of that of a traditional scheme. It can be seen that the complexity of the DNN was negligible because of the usage of the expert knowledge of the timing estimation. The model with the most operations in the proposed scheme was the resampling model, which was also halved as a result of using the integer symbol-spaced sampling rate rather than the fractional one. The complexity of the equalizer was tremendously reduced because of the time-invariant structure.
The proposed DNN-based scheme was verified in the experimental data packet, which was sampled in the 2011 sea trials of China’s first deep manned submersible “Jiaolong.” The experimental condition was introduced by Zhu [
1]. Unfortunately, the wind speed of the experiments was not recorded. However, from the timing estimation, it can be deduced that the surface wave height from the valley to the peak was 2.25 m and the period was approximately 8 s; these measurements can be used for accurate descriptions of the instant sea condition. The vertical and horizontal communication distances were 5030 m and 2390 m, respectively. The packet consisted of 15 LFMs and 14 data frames, and for each frame, the numbers of the training symbols and the information symbols were 200 and 1936, respectively. Only the internal 10 frames were used in the performance comparisons because the two boundary frames on each side had fewer LFM signals nearby for synchronization. During the experiments, the receiver consisted of four hydrophone channels, which were jointly processed using the model from [
1]. The hydrophone in the first channel was omnidirectional, and the other three hydrophones were cone-directional; their signal noise ratios (SNRs) were 9.3, 20.1, 18.5, and 16.9 dB, respectively and the total SNR was 18.9 dB (as described in [
1]), which was calculated using four-channel equal-gain combining. In this paper’s comparisons of the traditional and proposed methods, we only used the second channel, which had the largest SNR, and the adaptive equalizer was replaced by the time-invariant one for the robustness of the impulsive noise.
Figure 5 illustrates the timing offset before unbiasing and after interpolation, as well as the bias output of the DNN, which are marked in the previous section as
,
, and
, respectively. We can see that, after spline interpolation, the timing offset was much smoother, which fits the true movement, and the bias estimated by the DNN was in proportion to the first-order derivative of the displacement.
As the perfect timing offsets were not available, the DNN-based synchronization methods and the biased LFM synchronization methods were compared in the QPSK phase trajectories and symbol error rates (SERs), as shown in
Figure 6. As seen in
Figure 6a, for the bias LFM synchronization and linear interpolation, the symbols in the middle of the frame usually endured the maximum offset because of the uncompensated acceleration effect, which was first analyzed by Sharif [
21]. The spline interpolation could suppress the acceleration, and thus the error of the phase trajectory was linearly changing between the boundaries of each frame (see
Figure 6b) and the phase error was significant at the end of the frame because of the uncompensated synchronization bias. After the bias was learned from the DNN model, the QPSK symbols were well recovered, as shown in
Figure 6d, with an SER of 0.002. This was considered an excellent result for the single-channel received scheme of LFM-based synchronization followed by a simple time-invariant equalizer.