1. Introduction
Free-space optical (FSO) communication represents a cutting-edge technology that leverages free space—such as vacuum, atmospheric, and oceanic environments—as the transmission medium, with lasers serving as the carriers of information [
1]. This technology has garnered significant attention in the field of optical communication due to its substantial capacity, high bandwidth, and robust security features [
2]. Notably, coherent detection methods exhibit considerable promise for advancing FSO communication systems [
3], particularly when paired with high-order modulation techniques, which have spurred considerable research interest [
4,
5,
6].
However, the performance of FSO communication systems is affected by various limiting factors such as weather changes and device manufacturing processes. Transmission through atmospheric channels can further degrade system performance, negatively impacting the received signals [
7,
8,
9,
10]. To address these challenges, digital signal processing (DSP) is employed at the receiver end to reduce the overall bit error rate (BER) [
11]. The DSP workflow is illustrated in
Figure 1.
Figure 1 illustrates the signal processing sequence as follows: The received signal first undergoes I/Q balancing, utilizing the algorithm from [
12] to achieve amplitude and phase equalization and compensation between the I and Q paths. Subsequently, the timing recovery module employs the algorithm from [
13] to track and compensate for frequency offset and phase jitter between the transmitter clock and the receiver ADC sampling clock, thereby preventing degradation in the system’s BER performance. Next, polarization demultiplexing is primarily employed for channel equalization. The proposed Time-Tagged QPSK partitioning (TTQP) frequency offset estimation algorithm is then applied to estimate and correct the frequency offset, enhancing system performance and reducing the bit error rate (BER). Following this, carrier phase recovery is performed to estimate and compensate for phase distortion. The decision-directed least-mean-square (DD-LMS) algorithm is then utilized to improve signal quality. Finally, the system’s BER is calculated to assess the error performance after DSP processing. Depending on the application scenario, DSP in a digital coherent receiver can be classified into two types: offline and online. Offline processing involves applying the DSP algorithm to stored signals using simulation software, which can only compensate for pre-stored data, but provides valuable simulation validation for online processing. In contrast, online processing entails real-time signal processing, considering both resource utilization and timing constraints [
14,
15,
16]. While offline processing is limited to compensating stored data, it offers an effective means for the simulation verification of online processing [
17]. Based on this, we present an offline design of a parallel DSP algorithm in this study.
Currently, FOE algorithms can be broadly categorized into two types: frequency-domain carrier FOE and time-domain carrier FOE [
18]. For example, the frequency-domain maximum spectrum line search algorithm based on the 4th power fast Fourier transform (4th-FFT) [
19] offers high estimation accuracy, but is characterized by high complexity. Hardware implementations of this algorithm must balance complexity and performance, which can limit its practical applicability [
20]. Although the algorithm is suitable for various modulation formats and provides high estimation precision [
18,
21], its hardware implementation consumes significant resources and has certain limitations.
Time-domain FOE does not involve frequency domain transformations and is generally less complex than frequency-domain FOE [
22]. For instance, the training sequence (TS)-based FOE algorithm [
23] utilizes a training sequence to eliminate the modulation phase of the received signal. This approach avoids the computationally intensive 4th power operation, thereby reducing the complexity of the frequency offset estimation (FOE). However, this algorithm necessitates that the original training sequence be pre-stored at the receiving end, which introduces system training overhead [
24]. The time-domain 4th power algorithm [
22] is straightforward to implement, but exhibits poor compatibility and is impractical for high-order QAM systems.
The traditional QPSK partitioning algorithm [
25] addresses the limitations of the time-domain 4th power approach for QAM signals, but overlooks the time interval between the selected signals, leading to reduced estimation accuracy [
26,
27,
28]. To address this issue, some researchers have modified the traditional QPSK partitioning algorithm, proposing a generalized carrier FOE algorithm [
26]. However, implementing this modified algorithm in hardware presents significant challenges [
27]. One FOE method discussed in [
28] incorporates considerations for the time interval, but it results in high algorithmic complexity.
To address the aforementioned issues, we propose a TTQP FOE scheme and apply it to the PM 16QAM FSO spatial diversity reception simulation platform.
2. Operation Principle
Our algorithm differs from traditional QPSK partitioning methods in three fundamental ways: First, it employs time-tagging to accurately record the time intervals between signals, effectively mitigating cumulative effects and significantly enhancing estimation accuracy. Second, using 16QAM signals as an example, traditional methods rely on 4th power operations to remove the modulation phase, which introduces higher computational complexity and implementation challenges. In contrast, our approach utilizes a low-complexity 4th power operation, employing absolute value operations to achieve a fourfold magnification of the phase angle and thereby remove the modulation phase. Finally, our algorithm is designed as a parallel processing framework suitable for hardware implementation, providing a solid foundation for offline algorithm design and validation, and ensuring seamless integration with hardware systems.
The principle of the TTQP parallel algorithm is illustrated in
Figure 2. In the figure, the symbol * denotes the conjugate operation. This paper provides a detailed explanation of one of the processing paths. The TTQP frequency offset estimation (FOE) algorithm consists of three stages: the first stage involves data parallelization and time-tagging, the second stage focuses on frequency offset estimation, and the third stage addresses frequency offset compensation. This section will provide a comprehensive description of each of these stages.
2.1. Data Processing
Time-tagging of the 16QAM signals is a crucial step to ensure an accurate reflection of the temporal order of each signal. Based on this time-tagged information, we then proceed with time-tagged QPSK partitioning.
The principle of the time-tagged QPSK partitioning is as follows: using Euclidean distance, the Timing Marker Signal (TMS) is classified into Class I signals, denoted as QPSK-TMS, while Class II signals are to be excluded [
25].
Figure 3a illustrates the constellation points for Class I (QPSK-TMS) and Class II signals. Class I and Class II signals are situated on concentric annular rings with different radii, which are approximated as circles with radii of
,
, and
, respectively.
Figure 3b shows that signals located on circles with radii R1 and R3 are classified as Class I signals, while signals on the circle with radius R2 are classified as Class II signals. To reduce the algorithm complexity, we use the square of the Euclidean distance to determine appropriate decision boundaries. Specifically, the squared values of the radii
,
, and
are represented by
,
, and
, with values of 2, 10, and 18, respectively. Therefore, the classification criteria for Class I and Class II are as follows:
where
denotes the squared Euclidean distance of the signal point from the origin, and
,
, and
represent the squared values of
,
, and
, respectively.
As illustrated in
Figure 4, the QPSK-TMS signal is obtained after processing the TMS signals through QPSK partitioning. Consequently, adjacent QPSK-TMS signals may be discontinuous, with time intervals present. Here,
denotes the time intervals between adjacent QPSK-TMS signals, while
indicates the position of a signal within the QPSK-TMS sequence. In the subsequent frequency offset estimation phase, unlike traditional QPSK partitioning methods, the TTQP algorithm enhances estimation accuracy by mitigating the cumulative effects of these time intervals.
2.2. Frequency Offset Estimation
For a spatial coherent laser communication system with a 16QAM modulation format, and excluding the flicker effect of the atmospheric channel on the noise intensity of the transmission signal, the signal expression input to the frequency offset estimation and compensation module is as follows:
where
denotes the modulation phase of the nth signal in the input signal sequence,
represents the frequency offset value, T is the symbol duration,
denotes the carrier phase noise (including phase noise introduced by the laser linewidth and atmospheric turbulence), and
represents the amplified spontaneous emission (ASE) phase noise, which follows a zero-mean Gaussian distribution [
24].
In the second phase described in
Figure 2, the input signal
is first multiplied by the conjugate of the previous signal
to obtain the signal
. Next, the real and imaginary parts of
are extracted, and a low-complexity 4th power operation is performed, ultimately achieving a fourfold increase in the phase angle of
, resulting in
. The following section outlines the principles behind the simplified 4th power computation.
Figure 5a compares the differences between the squared values of the sine function and their absolute values.
Figure 5b compares the differences between the squared values of the cosine function and their absolute values. The label “radians” on the horizontal axis indicates that the angle
is measured in radians. The figures show that these values are very close, indicating that the squaring operations for the sine and cosine functions can be effectively replaced by absolute value operations. The core of the low-complexity 4th power algorithm is based on this substitution, utilizing absolute value operations to achieve a fourfold increase in the phase angle, thereby avoiding the high-complexity 4th power computations of traditional algorithms [
29].
The design of the low-complexity 4th power algorithm aims to simplify the computational complexity associated with traditional 4th power operations. The following section provides the derivation of this algorithm.
Step 1: The trigonometric functions corresponding to the real and imaginary parts are doubled. First, the real and imaginary parts of the signal can be expressed using the following equation:
Figure 5 demonstrates the similarity between the results obtained from squaring the trigonometric functions and performing absolute value operations, with the waveforms showing comparable patterns of variation. This confirms their interchangeability [
24]. Therefore, the following equation can be obtained:
Next, to achieve a twofold increase in the phase angle, we first use the double-angle formulas to expand
and
, and then apply Equation (4) to complete the derivation of the double-angle simplification. Substituting Equation (4) into the double-angle formula for
, we obtain the following equation:
The sine term requires further elaboration for clarification. From Equation (4), we can derive the following equation:
From the above derivation, it is evident that to ensure that the expanded
does not produce errors, both the sine function and cosine function must be expanded simultaneously. Equation (6) indicates that the sine function is expanded by a factor of
. To maintain coefficient balance, Equation (5) needs to be multiplied by the coefficient
to complete the expansion, resulting in the following formulas:
Step 2: Perform the 4th power operation on the trigonometric functions corresponding to the real and imaginary parts. Based on the angle expansion from Step 1, the fourfold increase in the angle can be achieved using the same principle. However, it is important to note that, to balance the coefficients of the real and imaginary parts, only the cosine coefficient needs to be multiplied by
in the first step.
Finally, to achieve a fourfold increase in the phase angle, we obtain the following equation:
The preceding steps achieved a fourfold increase in the phase angle. The next phase involves frequency offset estimation and compensation. Since the implementation principles are consistent across all parallel paths, we will focus on detailing the process for a single path. The following formula is used to complete the calculation of the frequency offset value:
Here, represents the frequency offset estimate for the symbol block, M is the number of symbols contained in each symbol block, and indicates the time intervals caused by discontinuities in the signal due to QPSK partitioning, which contribute to frequency offset accumulation. To reduce the impact of additive ASE noise, a summation and averaging method is employed across multiple symbols to smooth out the additive ASE noise.
The formula for the actual accumulated frequency offset error
is as follows:
denotes the actual accumulated frequency offset error. n represents the position of the received signal in the entire received signal, and represents the frequency offset of the current symbol.
2.3. Frequency Offset Compensation
Next, we proceed to the frequency offset compensation phase. This algorithm is based on a hardware-implemented parallel scheme, where the frequency offset is calculated once for every
M symbol (i.e., the symbol block length is
M). As shown in
Figure 6, we will next focus on the
symbol block for a detailed explanation, taking the
symbol of the
symbol block as an example. From the second phase, the frequency offset
for each symbol block was calculated with the following formula:
Here, m represents the signal of the symbol block. refers to the frequency offset basic error. represents the actual frequency offset accumulation of the last signal in the previous symbol block. The summation represents the actual frequency offset accumulation for the current symbol, denoted as .
The last stage is the frequency offset compensation stage; the formula is as follows:
3. Simulation Setup
Atmospheric turbulence induces phase distortions and intensity fluctuations, which result in noise. These noise effects are observed at the receiver as variations in signal strength and phase, significantly impacting system performance. Additionally, due to limitations in the laser manufacturing processes, the center frequencies of the signal laser and the local oscillator (LO) laser cannot be perfectly aligned. This misalignment, especially after transmission through the atmospheric channel, leads to further degradation of the signal’s coherence, resulting in frequency offsets and phase distortions, and introducing phase noise.
To address these issues, this study utilizes the spatial diversity reception simulation platform for the 10GBaud PM 16QAM free-space optical (FSO) system, as shown in
Figure 7. The simulation system incorporates different atmospheric structure constants (
) to represent varying levels of turbulence intensity. Specifically,
is used to simulate strong turbulence, while
represents weak turbulence. To model the impact of laser manufacturing process quality, different frequency offset values are configured accordingly. The following sections provide a detailed description of the relevant configurations of the simulation platform.
At the transmitter, the polarization beam splitter (PBS) splits the external carrier laser with a linewidth of 50 kHz into two orthogonal polarizations. Subsequently, 16QAM data symbols are generated from the PRBS, corresponding to the real and imaginary parts of the X and Y polarizations. They are then converted into electrical signals using a digital-to-analog converter (DAC), and the two orthogonally polarized optical signals are modulated using a dual-polarization IQ modulator. Finally, the signal is combined into a single signal beam using a polarization beam combiner and emitted into a simulated free space turbulent channel. The free space turbulent channel is simulated using a Fourier transform-based phase screen model, assuming that the outer scale of the atmospheric turbulence tends to infinity and the inner scale tends to zero. The range of atmospheric turbulence is 10 km. The light field distribution before passing through the phase screen is shown in
Figure 7a, and the light field distribution after passing through strong and weak atmospheric turbulences is shown in
Figure 7b,c, respectively. The refractive indexes of weak and strong turbulences are
and
, respectively [
24].
At the receiver end, multiple receiving telescopes receive independent, attenuated optical signals and couple them into a single-mode optical fiber. In the receiver, the received optical field is mixed with a local oscillator (LO) laser and the photoelectric conversion is completed by the balanced photodiodes (BPDs). Each receiving branch shares a common LO laser, with a linewidth of 50 kHz and an optical power of 7.5 dBm. After I/Q imbalance compensation, frame synchronization and diversity branch phase correction are performed on each branch, Maximum Ratio Combining (MRC) can be performed. The combined signals share a set of DSPs to reduce the complexity of the diversity receiving system. The simulation verifications conducted in this paper all use the four-branch received MRC. The main modules of the simulation system include I/Q imbalance recovery, diversity branch phase correction, MRC, polarization demultiplexing, TTQP carrier FOE, carrier phase recovery, decision-directed least-mean-square (DD-LMS), and bit error rate (BER) counting [
30]. Detailed parameters of the simulation environment are provided in
Table 1.
4. Simulation Results and Discussion
Under the conditions of four-branch received MRC, the relationship between the normalized mean square error (NMSE) and received optical power across different symbol block lengths was investigated on the B2B platform. The NMSE is defined as
, where
is the estimated frequency offset,
is the true frequency offset, and
is the symbol duration.
Figure 8 illustrates that, with constant received optical power, the normalized mean square error (NMSE) decreases as the symbol block length increases. However, the reduction in NMSE is not substantial when increasing the symbol block length from 1024 to 1536. The NMSE curves for symbol block lengths of 1024 and 1536 show minimal differences, and as the received optical power increases, these two curves exhibit similar trends.
Theoretically, longer symbol blocks can enhance estimation accuracy. However, excessively long symbol blocks may lead to increased resource usage and delays. Shorter symbol blocks exhibit limited performance in smoothing additive noise, but offer advantages such as faster frequency offset tracking and reduced resource consumption and latency. This algorithm aims to provide offline design and validation for hardware implementation; therefore, considering both algorithm performance and complexity, a symbol block length of 1024 was selected.
Figure 9 illustrates the performance comparison between the proposed algorithm and traditional QPSK partitioning under B2B conditions, utilizing four-branch reception. The comparison of algorithm performance was comprehensively conducted from both NMSE and BER perspectives.
Figure 9a presents the NMSE curve comparison for the two algorithms under different average received optical power conditions, while
Figure 9b shows the BER curve comparison for the two algorithms under the same conditions. From
Figure 9a, it can be observed that, under the same average received optical power, the proposed algorithm exhibits a lower NMSE, indicating higher estimation accuracy. As the average received optical power increases, the estimation accuracy continues to improve, with the NMSE reaching the order of
. In contrast, the traditional QPSK partitioning was only able to achieve an NMSE to the order of
, indicating a need for improvement in estimation accuracy.
Figure 9b shows that as the average received optical power increases, the system’s bit error rate (BER) decreases. When the BER reaches
(the hard decision threshold with 7% FEC overhead), the proposed scheme requires a lower average received optical power compared to traditional QPSK partitioning, reducing it by 0.67 dBm and thereby improving receiver sensitivity.
Figure 10 illustrates the performance comparison between the proposed algorithm and QPSK partitioning under weak turbulence conditions, utilizing four-branch reception. The algorithm performance is comprehensively compared from both NMSE and BER perspectives.
Figure 10a presents the NMSE curve comparison for the two algorithms under different average received optical power conditions, while
Figure 10b shows the BER curve comparison for the two algorithms under the same conditions.
From
Figure 10a, it can be observed that the proposed algorithm has a lower NMSE, indicating higher estimation accuracy and superior performance compared to the QPSK partitioning algorithm, consistent with the conclusions drawn under B2B conditions.
Figure 10b compares the average receiver sensitivity required for both algorithms to reach the FEC decision threshold under weak turbulence conditions. The results reveal that the proposed scheme requires a lower average received optical power to achieve the FEC threshold compared to traditional QPSK partitioning, reducing the average received optical power by 0.76 dBm. Thus, the proposed algorithm is able to enhance receiver sensitivity even under weak turbulence conditions.
Figure 11 illustrates the performance comparison between the proposed algorithm and QPSK partitioning under strong turbulence conditions, utilizing four-branch reception.
Figure 11a presents the NMSE curves for the two algorithms under various average received optical power conditions, while panel (b) illustrates the BER curves for the same conditions. As shown in
Figure 11a, the proposed algorithm exhibits a significant advantage under strong turbulence conditions compared to B2B and weak turbulence scenarios. The estimation accuracy of the proposed algorithm is notably higher at the same levels of received optical power. Specifically, under strong turbulence, the proposed algorithm maintains superior receiver sensitivity relative to QPSK partitioning. At the FEC threshold, the proposed algorithm achieves a reduction in average received optical power by 0.71 dBm.
Next, we will assess the estimation accuracy and range of the algorithms under varying frequency offset conditions to validate their effectiveness. This evaluation will help determine the robustness and adaptability of each algorithm in handling frequency offset variations, ensuring their performance remains robust in practical applications. We will compare the estimation accuracy and estimation range of the algorithms under different frequency offset conditions to validate their effectiveness. This comparison will help assess the robustness and adaptability of each algorithm in response to variations in frequency offset, ensuring that the algorithms can maintain good performance in practical applications.
Figure 12 shows the NMSE performance curves of the proposed algorithm and QPSK partitioning under different frequency offsets in a four-branch received MRC scenario, for both weak and strong turbulence environments.
Figure 12a shows a comparison of the NMSE curves under weak turbulence with an average received optical power of −16 dBm, while
Figure 12b illustrates the NMSE performance comparison under strong turbulence conditions, with an average received optical power of −18 dBm.
As illustrated in
Figure 12a, the NMSE of the proposed algorithm consistently surpasses that of the traditional QPSK partitioning algorithm under various frequency offset conditions. When the frequency offset lies between (−1 GHz, 1GHz), the proposed algorithm achieves an NMSE to the order of
, whereas the traditional QPSK partitioning algorithm only reaches the
range. This indicates the superior estimation accuracy of the proposed algorithm, showcasing its exceptional performance.
Figure 12b further presents the results under strong turbulence, where the proposed algorithm continues to maintain higher estimation accuracy, consistent with the findings under weak turbulence. Overall,
Figure 12 clearly demonstrates that the proposed algorithm consistently delivers better estimation accuracy across different turbulence conditions, highlighting its robustness and wide applicability in diverse environmental scenarios.
6. Conclusions
The simulation results demonstrate that, compared to traditional partitioning algorithms, the proposed algorithm improves receiver sensitivity by 0.67 dBm, 0.76 dBm, and 0.71 dBm under B2B, weak turbulence, and strong turbulence conditions, respectively. Under weak and strong turbulence conditions, the proposed algorithm achieves an NMSE range of , compared to the range for traditional algorithms. In various turbulence environments, the proposed algorithm consistently outperforms the traditional QPSK partitioning algorithm, indicating superior estimation accuracy. The proposed algorithm consistently outperforms the traditional QPSK partitioning algorithm across various turbulence environments, indicating the superior estimation accuracy of the proposed method. Additionally, the proposed algorithm employs a simplified approach in the frequency offset estimation phase by using absolute value operations instead of multiplication, which significantly reduces algorithm complexity. For , the complexity of the proposed algorithm is approximately of that of the traditional QPSK partitioning algorithm. Notably, the parallel design of the proposed algorithm effectively balances performance and complexity, making it highly suitable for real-time hardware implementation. This approach enhances DSP processing capabilities in real-time hardware systems and holds significant application value in coherent FSO communication systems.