1. Introduction
Orbital Angular Momentum (OAM) has become an object of research as one of the means to improve the transmission efficiency of optical communication [
1]. Recently, stable data transmission at the Tbit/s level has been achieved in the laboratory by multiplexing OAM beams [
2]. OAM multiplexed communication can be combined with polarization multiplexing and wavelength division multiplexing to enhance the transmission rate [
3]. The transmission rate can be further improved by combining OAM multiplexing with Multiple Input Multiple Output (MIMO) spatial multiplexing techniques [
4]. In addition, for OAM multiplexing-based Free Space Optical Communications (FSO) systems, MIMO equalization techniques can be utilized. A constant-mode blind equalization algorithm is used to improve the BER degradation in OAM multiplexing systems due to atmospheric turbulence [
5]. It can also be combined with optimal mode selection strategies to ensure the communication quality at high-speed communication [
6]. However, in FSO systems, the atmospheric refractive index variations due to atmospheric turbulence cause random amplitude undulations as well as phase distortions in the atmospherically transmitted laser beam, resulting in degradation of the performance of the FSO system [
7].
Adaptive Optics (AO) technology can achieve an improved performance of communication systems by correcting the dynamic wavefront aberrations of the beam [
8]. In AO, wavefront-free detection AO systems can directly use the light intensity information to design control algorithms that generate the control signals required by the wavefront corrector, i.e., wavefront reconstruction based on the aberrated light intensity image [
9]. Classical wavefront reconstruction methods for wavefront-free detection include the Gerchberg–Saxton (GS) phase recovery algorithm, stochastic parallel gradient descent algorithm, simulated annealing algorithm, and genetic algorithm [
10], which mostly need to be solved by iterative computation and are difficult to achieve real-time wavefront reconstruction.
Recently, deep learning techniques have been widely used in AO. Deep learning-based wavefront-free detection aims to use the light intensity image captured by the charge coupled device (CCD) camera as the input of a neural network and the wavefront aberration or Zernike coefficient as the output; then, the output is transformed into the control signal, and finally, the deformation mirror is controlled to achieve wavefront correction. Paine et al. [
11] applied a deep learning approach to the point expansion function to achieve wavefront reconstruction. Tian et al. [
12] recovered the wavefront based on a deep neural network model to address the problem of too many iterations of existing search algorithms. These studies showed the effectiveness of deep neural network models, and to reduce the computational complexity, Nishizaki et al. [
13] estimated wavefront Zernike coefficients directly from a single light intensity image by using a convolutional neural network model and improved the network estimation performance by preprocessing. Zhai et al. [
14] also used a convolutional neural network model to reduce the computation time. Ma et al. [
15], inspired by the phase difference approach, proposed to use the light intensity maps in the focal and out-of-focus planes as the input to the neural network and output of the wavefront Zernike coefficients. Ma et al. [
16] also explored the effect of the consistency of the training data on the wavefront recovery performance. Wu et al. [
17] also used a convolutional neural network to establish a mapping of the real image to its Zernike coefficients based on the idea of phase difference. Zhang et al. [
18] used residual networks to obtain wavefront Zernike coefficients and verified the robustness of deep residual networks.
The above literature shows that deep convolutional neural networks have shown better performance than traditional wavefront detection algorithms in wavefront-free detection AO systems, but they still suffer from inadequate perception and representation of light spot features. As the number of network layers deepens, gradient disappearance, explosion and overfitting are prone to occur, making it difficult to accomplish real-time accurate reconstruction tasks. Too many pooling operations can also lead to the loss of small target information, resulting in excessive reconstruction errors.
2. Vortex Beam Atmospheric Transport Model
The schematic diagram of the OAM state multiplexed communication system with wavefront correction is shown in
Figure 1. The system starts with n path independent messages transmitted using binary signals, and then, it passes through an optical modulator for baseband signal modulation, modulating each user’s message onto a Gaussian optical carrier and then loading the OAM mode using different OAM mode converters. The next n path OAM state signals are multiplexed by optical intensity superposition and distorted by optical intensity distortion after atmospheric turbulence. To reduce the effects of mode crosstalk and turbulence, wavefront correction techniques are introduced to improve the received optical quality. Afterward, the OAM light is transformed back to Gaussian light by OAM state demultiplexing and inverse converters, and finally, the signal is demodulated into a binary signal by an optical demodulator.
In free-space optical communication, the vortex beam has a continuous spiral wavefront compared to an ordinary beam, and the phase has uncertainty in the direction of beam propagation, where the phase singularity is. The Laguerre–Gauss (LG) beam in the vortex beam is a representative beam, and its light field expression in free space is shown in the following equation [
19].
where
denotes the radial index of the LG beam;
is the radial distance from the spatial point to the transmission axis;
is the azimuth angle;
,
is the beam waist radius;
is the Laguerre polynomial,
is the topological charge of the vortex beam, which can be taken as an integer;
is the wavenumber; and
is the Rayleigh distance.
When the light wave passes through multiple phase screens, only the phase changes while the amplitude remains the same. Under the Rytov approximation, the expression for the optical field at a transmission distance of
can be obtained, as shown in the following equation [
20].
where
and
denote the Fourier inverse transform and Fourier transform, respectively;
denotes the atmospheric turbulence phase screen function;
and
denote the number of spatial waves in
and
directions, respectively, and
.
Therefore, only the phase screen model of atmospheric turbulence needs to be simulated to calculate the magnitude of the optical field at the receiving end. In this paper, the Zernike polynomial defined by Noll is chosen to model the turbulent phase screen
, which can be expressed by the following equation.
where
and
are polar coordinates.
and
are the radial and angular steps, respectively; they are always integers and satisfy the following condition:
.
The first term of the Zernike polynomial represents the translation, which has no effect on the image imaging quality. Higher-order polynomials, which can represent higher-frequency components, are therefore disregarded in this paper, and a polynomial of order 36 is selected to generate a realistic atmospheric turbulence phase screen.
Denoting the
ith input signal as
, each signal will piggyback on the corresponding OAM mode, so the output signal of each OAM state is denoted as
. The
n-way signal multiplexing yields the following equation.
After atmospheric transmission, assuming that the mode crosstalk and noise are independent of each other during transmission, the received signal can be expressed as the following equation.
where
denotes the beam after atmospheric turbulence.
In demultiplexing, the interference of external factors destroys the orthogonality between the vortex beams and causes mode crosstalk; the information of its
kth path can be expressed by the following equation.
4. Deep Learning-Based Wavefront Recovery Simulation Experiments
To verify the effectiveness of the residual attention network proposed in this paper, experimental simulations of vortex beam transmission are carried out to develop experimental simulations of wavefront recovery at different turbulence intensities. The vortex beam undergoes phase distortion after passing through atmospheric turbulence, as shown by Equation (6). The Zernike coefficient is not statistically independent but follows a Kolmogorov distribution, and its magnitude is related to the
ratio, where
is the system aperture diameter and
is the atmospheric coherence length. For systems with reception apertures up to 1 m [
24],
can represent weak turbulence,
can represent medium turbulence, and
can represent strong turbulence. The rest of the parameters [
25] are set as shown in
Table 1.
In addition, to make the experiment more reasonable, a range of ratios was set at an interval of 2 to
. Each ratio generates 5000 sets of random Zernike coefficients and corresponding light intensity maps as training data. The ratio of the training set, validation set and test set is 8:1:1, the size of the training image is adjusted to 224 × 224 pixels, and the output of the network is 36 Zernike coefficients. To make the experimental comparison clearer, the next test will also generate the corresponding wavefront phase from the output Zernike coefficients for comparison and test the network accuracy from the residuals between the predicted phase and the true phase. The turbulence phase screen and light intensity distribution map shown in
Figure 4 can be generated from Equation (3) and the parameters in
Table 1. In
Figure 4a–c, the phase screens of atmospheric turbulence simulated by Zernike coefficients at different turbulence intensities are represented in degrees, and the PV and RMS values in rad are shown in the figure, with larger values representing larger phase screen undulations and worse atmospheric conditions.
The experimental simulation environment uses the Keras deep learning library in Python language, and the offline training epoch is set to 100, and each batch size is set to 50. The adaptive learning rate Adam algorithm is used to set the initial learning rate to 0.001, and the learning rate decreases to 0.0001 when the network accuracy is not increased within 10 batches. The dropout layer is used in the fully connected layer to prevent network overfitting, and the tanh function is used as the network activation function. Through comprehensive evaluation of experiments, the weights of each part of the loss function in this paper are set to
,
,
. The value of the loss function and the size of the accuracy rate in the training process are shown in
Figure 5.
In
Figure 5, the loss function in
Figure 5a decreases as the number of iterations increases, and the accuracy rate in
Figure 5b gradually increases, and finally, the overall accuracy rate can reach about 96%. The variation of loss value (val_loss) and accuracy value (val_acc) in the validation set is close to that in the training set, which indicates that the network structure is reasonably designed and there is no overfitting phenomenon, and its accuracy rate can reach about 97%.
To verify the effectiveness and robustness of the residual attention network model, five sets of Zernike coefficients were randomly generated in the experiment at different turbulence intensities, and the corresponding phase screens were generated as shown in
Figure 6a. The Zernike coefficients are predicted by the hybrid attention network model proposed in this paper, and the estimated results and their residual phases are shown in
Figure 6b,c, and the results of the literature using CNN are shown in
Figure 6d. It can be seen that the residuals predicted by the residual attention network model proposed in this paper have the smallest residuals, and the results predicted by the model using only convolutional neural networks are not accurate enough. The PV and RMS values of the residual phase are shown in
Table 2. The aberrated phase residuals predicted by the method in this paper have smaller values and can obtain a more realistic phase screen compared to the previous work. Therefore, it can be seen from
Figure 6 and
Table 2 that the hybrid attention network has a good reconstruction effect at different turbulence intensities, and the recovered phase screen is similar to the actual phase screen.
Figure 7 shows the comparison between the predicted Zernike coefficients and the actual coefficients for five different turbulence intensities, and it can be seen that the predicted results of the method in this paper are closer to the actual coefficients.
After reconstructing the wavefront based on the predicted Zernike coefficients, the correction of the turbulent phase can be achieved by loading an inverse phase wavefront to the distorted beam. To verify the effectiveness of the inverse phase of the residual attention network in the OAM multiplexed communication system, the topological charge of the fixed transmission beam is 1, −2, 3, and −5, the signal-to-noise ratio is set to 10, the modulation is quadratic phase-shift keying, and other conditions remain unchanged, and the experimental simulation is carried out under different turbulence intensities, the results of which are shown in
Figure 8. It can be seen that the system BER increases as the turbulence intensity becomes stronger. Compared with the CNN network model, the system with the lowest BER after correction using the model in this paper, the system performance is not improved much after using the Gerchberg–Saxton phase recovery algorithm. The system without the phase correction algorithm has the highest BER, which is indicated by none in the figure, and it can be seen that the BER of the uncorrected signal hardly exceeds the Forward Error Correction (FEC) limit.
To verify the robustness of the residual attention network model, experimental simulations were performed at different signal-to-noise ratios and turbulence intensities, and the results obtained are shown in
Figure 9. As can be seen from the figure, the BER increases as the turbulence intensity becomes stronger, but better results than the CNN network can be obtained using the model proposed in this paper at either turbulence intensity. At a BER of 10
−3, this model has a performance gain of about 1 dB compared to the CNN model. As can be seen from the figure, the signal corrected by the residual attention model has better BER performance and can reach the FEC limit faster than the CNN model.
Distance is one of the key factors affecting the performance of optical communication systems, and the BER of the system grows with the increase in transmission distance. In the experiment, the same simulation conditions were set with a signal-to-noise ratio of 10 and no other conditions were varied, and the comparison was performed at different distances and turbulence intensities, and the results obtained are shown in
Figure 10. From the figure, it can be seen that the turbulence effect becomes larger and the system performance becomes worse as the distance increases. At different turbulence intensities, the system corrected using the model in this paper has the lowest BER, which indicates the robustness of the model. As seen in the figure, the residual attention model can reach the FEC limit faster at an SNR of 10.
To verify the effectiveness of the hybrid attention structure proposed in this paper, several different models were trained in the experiments, including the base backbone network ResNet50, the network with CA added, the network with SA added, and the network with the hybrid attention structure added, and the accuracy and running time comparison results were obtained, as shown in
Table 3.
Table 3 shows that the network with the added attention mechanism has the highest accuracy in reconstructing the Zernike coefficients and the hybrid attention structure has a higher accuracy of 97.3% compared to the single attention structure. In terms of the running time, the average value was taken after testing 1000 times, and it can be seen from the table that the computation time of the added attention network is all below 10 ms, which can meet the requirement of real-time wavefront correction, and the running time can continue to decrease with the increase in hardware performance. The running time of the hybrid attention structure is slightly longer than that of ResNet50, but the performance of the residual attention network proposed in this paper is the best after comprehensive accuracy is evaluated.
5. Conclusions
In this paper, we propose an inverse wavefront phase of the AO system combined with a residual attention network to achieve effective wavefront correction and thus reduce the system BER by addressing the problem of vortex beam distortion caused by atmospheric turbulence, which leads to the degradation of FSO communication quality. The trained model establishes an accurate mapping between the first 36th-order Zernike coefficients and the aberrated light intensity distribution. Experimental simulations were performed under different turbulence conditions, and the obtained wavefront residuals PV and RMS values showed an improvement over previous studies, indicating the strong robustness of the network. At a BER of 10−3, the residual attention network has a performance gain of about 1 dB compared to the CNN network. In the case of , PV = 0.094 rad, RMS = 0.018 rad; in the case of , PV = 0.236 rad, RMS = 0.085 rad; in the case of , PV = 0.335 rad, RMS = 0.122 rad. The predicted Zernike coefficients are similar to the actual coefficients, and the phases reconstructed by the coefficients are highly similar to the actual phases. The effectiveness of the hybrid attention network in the reconstructed wavefront phase task is then verified, with the highest accuracy with less increase in time complexity. The high accuracy, real-time performance, and flexibility of the residual attention network provide the practical application of deep learning in AO systems.