Single-Shot Wavefront Sensing in Focal Plane Imaging Using Transformer Networks

Kou, Hangning; Gu, Jingliang; You, Jiang; Wan, Min; Ye, Zixun; Xiang, Zhengjiao; Yue, Xian

doi:10.3390/opt6010011

Open AccessArticle

Single-Shot Wavefront Sensing in Focal Plane Imaging Using Transformer Networks

by

Hangning Kou

^1,2

,

Jingliang Gu

^2,*,

Jiang You

²,

Min Wan

²,

Zixun Ye

²

,

Zhengjiao Xiang

² and

Xian Yue

²

¹

Graduate School of China Academy of Engineering Physics, Beijing 100088, China

²

Institute of Applied Electronics, China Academy of Engineering Physics, Mianyang 621900, China

^*

Author to whom correspondence should be addressed.

Optics 2025, 6(1), 11; https://doi.org/10.3390/opt6010011

Submission received: 17 February 2025 / Revised: 6 March 2025 / Accepted: 17 March 2025 / Published: 20 March 2025

(This article belongs to the Section Engineering Optics)

Download

Browse Figures

Versions Notes

Abstract

:

Wavefront sensing is an essential technique in optical imaging, adaptive optics, and atmospheric turbulence correction. Traditional wavefront reconstruction methods, including the Gerchberg–Saxton (GS) algorithm and phase diversity (PD) techniques, are often limited by issues such as low inversion accuracy, slow convergence, and the presence of multiple possible solutions. Recent developments in deep learning have led to new methods, although conventional CNN-based models still face challenges in effectively capturing global context. To overcome these limitations, we propose a Transformer-based single-shot wavefront sensing method, which directly reconstructs wavefront aberrations from focal plane intensity images. Our model integrates a Normalization-based Attention Module (NAM) into the CoAtNet architecture, which strengthens feature extraction and leads to more accurate wavefront characterization. Experimental results in both simulated and real-world conditions indicate that our method achieves a 4.5% reduction in normalized wavefront error (NWE) compared to ResNet34, suggesting improved performance over conventional deep learning models. Additionally, by leveraging Walsh function modulation, our approach resolves the multiple-solution problem inherent in phase retrieval techniques. The proposed model achieves high accuracy, fast convergence, and simplicity in implementation, making it a promising solution for wavefront sensing applications.

Keywords:

wavefront sensing; transformer; single-shot wavefront reconstruction; Walsh function modulation; focal plane imaging

1. Introduction

The optical phase carries critical information about the propagation of light waves and can be utilized to analyze the properties of objects by detecting changes in the optical wavefront phase. This technology has been widely applied in areas such as atmospheric turbulence detection, optical element defect analysis, and biological sample research [1,2,3]. Among these, single-shot focal plane wavefront recognition is a technique that reconstructs the incident wavefront from a single focal plane intensity image.

Traditional wavefront recognition methods mainly rely on the two-dimensional intensity distribution of the diffracted light field to reconstruct wavefront aberrations. In 1972, Gerchberg and Saxton introduced the GS algorithm, which uses Fourier transforms to iteratively compute the intensity distributions of both the far field and near field, thus recovering the wavefront phase [4]. The GS algorithm has a simple structure, high energy concentration, and can detect higher-order aberrations. However, it suffers from low inversion precision, a tendency to converge to local optima, and a fundamental issue of multiple solutions caused by the similarity between the near-field complex amplitude and its 180° rotated conjugate in the far-field intensity distribution [5].

In 1979, Gonsalves and Chidlaw introduced the phase diversity (PD) method for wavefront reconstruction. This method uses two images—one in focus and one defocused—to iteratively recover the wavefront phase by combining intensity and defocus information. While the PD method resolves the multiple-solution issue by adding a constraint, it complicates the optical system and increases computational demands [6]. In 1978, Fienup introduced the hybrid input-output (HIO) algorithm, which alternates between projection onto object-domain and frequency-domain constraint sets to reconstruct the image. Building on the GS algorithm, the HIO algorithm improves convergence speed, especially in coherent diffraction imaging. However, it still fails to resolve the issue of multiple solutions [7]. Fienup proposed using asymmetric apertures to break the system’s symmetry [8]. Although this approach effectively addresses the issue, it also increases the complexity of optical design and requires more precise alignment for optimal performance. In 2019, Kong Qingfeng and colleagues from the Chinese Academy of Sciences proposed an improved GS algorithm based on Walsh function modulation for wavefront reconstruction, eliminating the 180° rotational symmetry of the modulation phase and successfully recovering the wavefront phase [9]. However, it remains an iterative method, suffering from issues such as low inversion accuracy and convergence difficulties.

In recent years, advancements in deep learning have led to new approaches for wavefront reconstruction. In 1992, Jorgenson and Aitken proposed using adaptive linear predictors and backpropagation neural networks (NNs) to predict wavefront distortions. Their results showed that neural network predictors achieved a mean squared error (MSE) about half that of linear systems [10]. In 2020, Liu et al. applied deep learning to wavefront phase prediction in open-loop adaptive optics, using a long short-term memory (LSTM)-based neural network to compensate for control loop delays. Their approach outperformed traditional methods by providing stable and accurate wavefront predictions under varying turbulence conditions, highlighting the potential of neural networks for real-time wavefront phase recovery [11]. However, the lack of model interpretability raises concerns about robustness in practical applications. In 2023, Chen et al. introduced PWFS-ResUnet, a network that reconstructs wavefront phase maps from plenoptic wavefront sensor (PWFS) slope measurements, effectively mitigating nonlinear issues in traditional approaches [12]. Building upon attention mechanisms [13,14], Feng et al., 2023, proposed SH-U-Transformer, a Transformer-based network that directly reconstructs wavefront distributions from Shack–Hartmann wavefront sensor (SHWFS) spot array images [15]. In 2024, Zhang et al. presented ADSA-Net, which integrates an additive self-attention (ADSA) module to significantly enhance both accuracy and efficiency in multi-aperture object feature recognition [16]. Similarly, Hu et al., 2024, developed an automated detection system leveraging an improved attention U-Net, enabling fast and accurate micropore defect detection in composite thin-film materials [17]. Meanwhile, Zhao et al. combined adaptive image processing with convolutional neural networks that incorporate a simple, parameter-free attention module (SimAM), greatly improving the performance and robustness of superimposed orbital angular momentum (OAM) beam topological charge identification under turbulent conditions [18]. Finally, Kazemzadeh et al. advanced image transmission fidelity through turbulent media by introducing a global attention mechanism (GAM) [19].

To address the limitations of existing methods—such as low inversion precision, convergence challenges, system complexity, and multiple solutions—we propose a novel single-frame focal plane wavefront recognition method based on the Transformer architecture. By utilizing the feature extraction and sequence modeling strengths of Transformers, our approach aims to improve upon traditional iterative methods and deep learning-based techniques.

This study explores the proposed method, covering its theoretical foundation, algorithm design, and experimental validation. The results show improvements in wavefront reconstruction accuracy and demonstrate the system’s simplicity, as well as its ability to resolve multi-solution issues, making it a promising solution for optical wavefront sensing and imaging.

2. Basic Principles

2.1. Generation of Multiple-Solution Problems

According to the principles of Fourier optics, the propagation of an optical field in space results in the complex amplitude distribution

U (x, y)

in the far field being equal to the Fourier transform of the near-field complex amplitude distribution

U (x_{0}, y_{0})

:

U (x, y) = \iint U (x_{0}, y_{0}) \exp [- i 2 π (u x_{0} + v y_{0})] d x_{0} d y_{0} |_{u = \frac{x}{λ l^{'}}, v = \frac{y}{λ l^{'}}},

(1)

Here,

λ

is the wavelength,

l^{'}

is the distance to the far field, and

u

and

v

represent the frequency domain coordinates, which can be converted to the image plane coordinates

x

and

y

through the formula:

u = \frac{x}{λ l^{'}}

,

v = \frac{y}{λ l^{'}}

.

U (x_{0}, y_{0})

can be expressed as the product of amplitude and phase components:

U (x_{0}, y_{0}) = A (x_{0}, y_{0}) \exp [i φ (x_{0}, y_{0})]

(2)

When the phase of the incident light is inverted by 180°, the resulting complex conjugate amplitude is denoted as

φ^{'} (x_{0}, y_{0}) = - φ (- x_{0}, - y_{0})

. The far-field distributions of the incident light and its complex conjugate are, respectively:

U (x, y) = A \iint \exp [i φ (x_{0}, y_{0})] \exp [- i 2 π (u x_{0} + v y_{0})] d x_{0} d y_{0} |_{u = \frac{x}{λ l^{'}}, v = \frac{y}{λ l^{'}}},

(3)

U^{'} (x, y) = A \iint \exp [- i φ (- x_{0}, - y_{0})] \exp [- i 2 π (u x_{0} + v y_{0})] d x_{0} d y_{0} |_{u = \frac{x}{λ l^{'}}, v = \frac{y}{λ l^{'}}},

(4)

From Equations (3) and (4), we obtain:

{|U (x, y)|}^{2} = {|U^{'} (x, y)|}^{2},

(5)

It shows that the incident light field with 180° rotational symmetry and the original light field have identical far-field intensity distributions, which causes the phase-retrieval method to encounter a multi-solution problem.

2.2. Walsh Function Modulation

To address the multiple-solution issue, we break the symmetry between the light field and its 180° rotationally symmetric light field. Walsh functions are considered for phase modulation. Walsh functions are a special set of orthogonal functions proposed by J.L. Walsh in 1923, are used for phase modulation, and have broad applications in image processing, communication, and other fields [20]. In a 2D polar coordinate system, Walsh functions are generated by partitioning the angular and radial directions into small sectors with relatively small areas and filling them with +1 or −1. The angular domain ranges are

[- π, π]

, with function values exhibiting an equidistant distribution. Radially, to satisfy orthogonality and equal sector areas, the radial positions where the function values change are determined by [21]:

r_{i} = \sqrt{\frac{i}{N}} R,

(6)

In this context, the index i ranges from 0 to N, and R represents the aperture radius. The 2D polar coordinate Walsh function is shown in Figure 1.

Assuming a phase plate with a distribution similar to Walsh functions, where discrete phase jumps occur in the black and white regions, and the phase difference of the plate is

Δ W (x, y)

, the far-field diffraction of conjugate light waves after passing through the phase plate can be described by Equations (7) and (8):

U (x, y) = A \iint \exp {i [φ (x_{0}, y_{0}) + Δ W (x_{0}, y_{0})]} \exp [- i 2 π (u x_{0} + v y_{0})] d x_{0} d y_{0} |_{u = \frac{x}{λ l^{'}}, v = \frac{y}{λ l^{'}}},

(7)

U^{'} (x, y) = A \iint \exp {- i [φ (- x_{0}, - y_{0}) + Δ W (x_{0}, y_{0})]} \exp [- i 2 π (u x_{0} + v y_{0})] d x_{0} d y_{0} |_{u = \frac{x}{λ l^{'}}, v = \frac{y}{λ l^{'}}},

(8)

The 180° rotational symmetry of the light field is broken, eliminating the multiple-solution issue. However, the phase plate exhibits a conjugate-like relationship:

Δ W (x_{0}, y_{0}) = - Δ W (- x_{0}, - y_{0}),

(9)

At this point, the light field still satisfies 180° rotational symmetry, the far-field diffraction fields are identical, and the multiple-solution problem still exists. In Figure 1, the phase plates that satisfy such conditions are W3, W6, W7, W8, W9, and W10.

2.3. Zernike Polynomial Method for Simulating Phase Screens

To simulate the effects of atmospheric turbulence on light wave propagation, an atmospheric phase screen is generally used for representation. There are two methods for simulating atmospheric phase screens: the power spectrum inversion method and the Zernike polynomial method [22]. Since Zernike polynomials are complete and orthogonal in circular domains, the atmospheric turbulence-induced phase distortion can be decomposed into the orthogonal sum of Zernike polynomials, typically represented by the sum of the first N terms:

φ (x, y) = \sum_{k = 1}^{N} a_{k} z_{k} (x, y),

(10)

where

a_{k}

represents the coefficient of Zernike polynomial, with k = 1, 2, …, N. A portion of the Zernike polynomials and their corresponding physical meanings are shown in Table 1:

Combining Walsh function modulation, five types of phase screens (W3, W4, W5, W12) were selected, where W3 satisfies 180° rotational symmetry in phase. Using the Zernike polynomial method, the primary spherical aberration and the primary spherical aberration with 180° rotational symmetry were generated. The intensity distributions of the far-field diffraction patterns were calculated. The simulation experiment used a light wavelength of 625 nm, a beam radius of 2 mm, a focal length of 50 mm, a resolution of 1000 × 1000, and phase steps of the phase screen are

π / 2

. The simulation results, as shown in Figure 2, indicate that after W3 modulation, the wavefront and its complex conjugate wavefront yield identical far-field images at the focal plane. However, after modulation by other Walsh functions, the far-field images of the wavefront and its complex conjugate wavefront at the focal plane no longer match.

3. Development of Neural Network Systems

Convolutional neural networks (CNNs) have shown exceptional performance in image processing, particularly in recognizing complex features and capturing local details. However, their ability to integrate global context remains limited. In contrast, Transformer architectures excel in natural language processing (NLP), with superior capabilities in sequence data processing and global information aggregation. To address this gap, we developed a Transformer-based neural network system that accepts far-field focal plane images as input and outputs the first 36 Zernike polynomial coefficients representing near-field wavefront aberrations, thereby enabling accurate reconstruction of incident optical wavefronts. The experimental workflow is shown in Figure 3:

The core architecture of our neural network is based on Convolutional Attention Network (CoAtNet) [23], with comparative implementations using MobileViT (Lightweight Vision Transformer) [24] and ResNet34. The CoAtNet architecture combines Transformer principles with convolutional operations, enhancing both expressive power and computational efficiency. As shown in Figure 4, the network uses a sequential arrangement of convolutional blocks and self-attention modules for automated feature extraction through deep learning.

To improve network performance, we added a Normalization-based Attention Module (NAM) between stages S2 and S3 of the CoAtNet framework and conducted ablation studies with the baseline architecture. The NAM module uses channel attention to recalibrate channel-wise weights, applying batch normalization to compute channel importance scores and modulating feature maps through sigmoid-activated weight allocation [25].

We also implemented MobileViT and ResNet34 architectures as comparative benchmarks. The MobileViT core module consists of three key components: Unfold-Transformer-Fold operations, where the Unfold/Fold modules reshape tensors for Transformer processing. Our configuration includes three cascaded MobileViT blocks, interleaved with convolutional and fully connected layers, as shown in Figure 5. The ResNet34 architecture serves as a standard deep residual network, providing baseline performance for comparison.

The network processes 256 × 256 single-channel grayscale images as input and generates 36 × 1 output vectors representing the 1st to 36th order Zernike polynomial coefficients. A comparative analysis of the architectural complexity and parameter counts for the four networks is presented in Table 2.

4. Experiments and Analysis

The experimental setup used power spectrum inversion to generate 100,000 atmospheric phase screens with randomly sampled atmospheric coherence lengths (0.03–0.15 m), turbulence parameters (outer scale: 100 m, inner scale: 0.1 m), and optical specifications (wavelength: 1064 nm, pupil diameter: 54 mm, focal length: 50 mm). Ground truth labels were obtained through Moore–Penrose pseudoinverse calculation of the first 36 Zernike polynomial coefficients. Phase modulation was implemented using W4, W5, and W12 functions, followed by near-field to far-field intensity conversion via Fresnel diffraction, yielding 100,000 far-field intensity maps (90,000 for training, 10,000 for validation).

The computational platform used an NVIDIA A100 GPU (40 GB memory, NVIDIA, Santa Clara, CA, USA) and Intel Xeon Gold 6338 CPU (2.00 GHz, Intel, Santa Clara, CA, USA). The training protocol was set for 300 epochs, with a batch size of 256 and adaptive learning rate scheduling (initial rate: 0.01). Optimization stopped after 300 epochs or when loss converged, with RMSE between predicted and ground truth Zernike coefficients as the loss metric.

Figure 6 shows the loss function trajectories across training iterations, while Table 3 presents the mean squared error (MSE) of Zernike coefficients on the validation set, highlighting performance variations among the evaluated models.

As shown by the training dynamics (Figure 6), all models reached performance saturation around epoch 200, with W12-modulated datasets showing the lowest converged loss values. Quantitative metrics in Table 3 indicate that the NAM-CoAtNet model trained with W12-modulated data achieves the lowest root mean square error (RMSE = 0.0081) on the validation set, validating the effectiveness of our NAM-enhanced architecture in extracting discriminative features from diffraction-based far-field intensity patterns.

The reconstructed wavefronts derived from inferred Zernike coefficients were evaluated using normalized wavefront error (NWE), defined as the ratio of the root mean square (RMS) of the wavefront residual to the RMS of the original wavefront. This dimensionless metric provides a standardized error measure for cross-system comparative analysis, formulated as:

NWE = \frac{\sqrt{\frac{1}{N} \sum_{i = 1}^{N} {(W_{rec, i} - W_{orig, i})}^{2}}}{\sqrt{\frac{1}{N} \sum_{i = 1}^{N} W_{orig, i}^{2}}},

(11)

W_{rec, i}

denote the reconstructed wavefront value at the i-th pixel and

W_{orig, i}

represent the corresponding ground truth value at the i-th spatial position. Quantitative evaluations were performed on 800 validation set images through systematic inference experiments, with the empirical results comprehensively summarized in Table 4.

Table 4 shows the normalized wavefront error (NWE) of Zernike coefficient predictions across the evaluated neural architectures on the validation set. The data reveal that the CoAtNet framework performs best with W12-modulated data (NWE: 7.2%), with 2.6% and 3.3% reductions compared to ResNet34 (NWE: 9.8%) and MobileViT (NWE: 10.5%), respectively. The NAM-enhanced variant (NAM-CoAtNet) further reduces the error to 5.3% NWE, setting a new standard for prediction accuracy. Cross-modulation analysis shows all three baseline architectures (ResNet34, MobileViT, and CoAtNet) achieve minimal NWE values with W12 modulation, highlighting the importance of W12 phase encoding in improving Zernike coefficient estimation precision.

Notably, NAM-CoAtNet shows significant accuracy improvements over vanilla CoAtNet under both W5 and W12 modulation conditions. This performance gain validates the NAM module’s ability to optimize channel attention, enabling prioritized processing of wavefront-critical features through adaptive feature recalibration (see Section 3).

Figure 7 compares predicted and reference Zernike coefficients under W12 modulation, showing strong correlation (Pearson’s r > 0.94) across modes Z₁₆–Z₃₆. These results validate the effectiveness of integrating convolutional and Transformer layers with attention mechanisms for diffraction feature extraction.

In total, 500 pairs of complex conjugate phase screens were generated, modulated using W12, and diffracted to obtain far-field intensity patterns. The trained model was then used to predict the wavefronts, where

φ (x, y)

represents the original wavefront and

- φ (- x, - y)

denotes the complex conjugate wavefront. The normalized wavefront error (NWE) between the reconstructed and original wavefronts was calculated, with results summarized in Table 5. Additionally, images of the far-field diffraction patterns corresponding to the Zernike coefficients predicted by the NAM-CoAtNet network are shown in Figure 8.

The experimental results confirm that Walsh function-modulated far-field intensity patterns enable accurate near-field wavefront reconstruction through model inference. The NAM-CoAtNet architecture achieves optimal performance with normalized wavefront errors of 5.4% (original wavefront) and 6.3% (phase-conjugated counterpart), demonstrating its ability to distinguish complex-conjugate phase relationships through diffraction pattern analysis.

To assess practical applicability, we designed an offset-augmented dataset simulating 256 × 256 focal plane images with lateral displacements (0–50 pixels) from the optical axis. Using identical training protocols (90,000 training/10,000 validation samples), the system maintains stable reconstruction accuracy as shown in Figure 9, with <8% NWE degradation under maximum displacement conditions.

As shown in Figure 9, the system preserves the original wavefront morphology and maintains consistent coefficient variation patterns, demonstrating robust feature-encoding capabilities against displacement.

To address simulation-to-reality discrepancies, we established the systematic validation platform shown in Figure 10:

A 100 mW laser operating at 1064 nm was collimated by a beam expander to generate a stabilized beam. The collimated beam passed through a deformable mirror (wavefront corrector) programmed to simulate atmospheric turbulence-induced aberrations. A beam splitter then divided the beam into two paths: the transmitted beam was directed to a Shack–Hartmann wavefront sensor (36 × 36 lenslet array, μm resolution) for aberration measurement, while the reflected beam was modulated by a W5 Walsh phase plate and focused through a 54 mm aperture onto an imaging camera. Figure 11 shows the light field detected by the Shack–Hartmann wavefront sensor (SHWS) following Walsh phase modulation, along with the far-field intensity distribution captured at the focal plane through the Walsh phase plate.

The experimental results are shown in Table 6:

According to Table 6, the normalized wavefront error (NWE) values of all models in real-world experiments are consistently higher than in simulations, with MobileViT and ResNet showing significantly worse performance. This discrepancy may arise from practical challenges in real-world settings, such as ambient light interference, instrumental errors, and measurement inaccuracies, which can compromise data quality and degrade results.

Among the models tested, NAM-CoAtNet shows the lowest NWE (7.7%) in experiments, indicating its stronger adaptability and robustness. However, its experimental NWE remains higher than the simulation results, highlighting the gap between ideal and practical conditions. Future work could focus on improving experimental protocols (e.g., environmental controls, sensor calibration) or enhancing model architectures to better handle real-world perturbations.

5. Conclusions

This study proposes a Transformer-based single-shot wavefront sensing approach for focal plane imaging, offering a simplified and efficient alternative to traditional wavefront reconstruction methods. Unlike iterative phase retrieval algorithms, this method does not require multiple intensity measurements or complex optical setups, making it more suitable for real-time applications. By integrating a Normalization-based Attention Module (NAM), the model effectively uses diffraction-based intensity patterns to estimate wavefront aberrations while maintaining computational efficiency.

A key advantage is its ability to address the multiple-solution problem in phase retrieval through Walsh function modulation, ensuring meaningful wavefront reconstruction. The method still has room for improvement, particularly in enhancing robustness against real-world optical disturbances and noise. Optimizing feature extraction mechanisms and incorporating noise suppression strategies will further improve its applicability in optical imaging and adaptive optics.

Author Contributions

Conceptualization, H.K., J.G. and J.Y.; methodology, H.K., J.Y. and M.W.; software, H.K., J.Y. and Z.X.; validation, H.K. and X.Y.; formal analysis, H.K. and Z.Y.; investigation, H.K. and J.Y.; resources, X.Y., J.G. and M.W.; data curation, Z.X. and X.Y.; writing—original draft preparation, H.K.; writing—review and editing, H.K. and J.Y.; visualization, H.K.; supervision, Z.Y.; project administration, J.G. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The datasets and code during the study are available from H.K. on reasonable request.

Acknowledgments

We would like to thank the Optical Control Group of the China Academy of Engineering Physics for their support.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Zhou, H.; Su, X.; Duan, Y.; Song, H.; Zou, K.; Zhang, R.; Song, H.; Hu, N.; Tur, M.; Willner, A.E. Atmospheric Turbulence Strength Distribution along a Propagation Path Probed by Longitudinally Structured Optical Beams. Nat. Commun. 2023, 14, 4701. [Google Scholar] [CrossRef] [PubMed]
Sun, S.; Wei, J.; Liu, Y. Recent Advances in Optical Wavefront Sensing Technology. J. Univ. Electron. Sci. Technol. China 2024, 53, 672–684. [Google Scholar] [CrossRef]
Li, M.; Hou, X.; Zhao, W.; Wang, H.; Li, M.; Hu, X.; Zhao, Y.; Zhou, Y. Current Situation and Development Trend of Aspheric Optical Surface Defect Detection Technology (Invited). Infrared Laser Eng. 2022, 51, 20220457. [Google Scholar] [CrossRef]
Gerchberg, R.W.; Saxton, W.O. A Practical Algorithm for the Determination of Phase from Image and Diffraction Plane Pictures. Optik 1972, 35, 237–246. [Google Scholar]
Fienup, J.R.; Wackerman, C.C. Phase-Retrieval Stagnation Problems and Solutions. J. Opt. Soc. Am. A 1986, 3, 1897–1907. [Google Scholar] [CrossRef]
Gonsalves, R.A.; Chidlaw, R. Wavefront Sensing by Phase Retrieval. In Applications of Digital Image Processing III; SPIE: San Diego, CA, USA, 1979; Volume 207, pp. 32–39. [Google Scholar] [CrossRef]
Fienup, J.R. Reconstruction of an Object from the Modulus of Its Fourier Transform. Opt. Lett. 1978, 3, 27–29. [Google Scholar] [CrossRef]
Fienup, J.R. Phase Retrieval Algorithms: A Comparison. Appl. Opt. 1982, 21, 2758–2769. [Google Scholar] [CrossRef]
Kong, Q.F.; Wang, S.; Yang, P.; Lin, H.; Liu, Y.; Xu, B. Single-Frame Far-Field Wavefront Retrieval Method Based on Walsh Function Modulation. Opto-Electron. Eng. 2020, 47, 190323. [Google Scholar] [CrossRef]
Jorgenson, M.B.; Aitken, G.J.M. Prediction of Atmospherically Induced Wave-Front Degradations. Opt. Lett. 1992, 17, 466–468. [Google Scholar] [CrossRef]
Liu, X.; Morris, T.; Saunter, C.; de Cos Juez, F.J.; González-Gutiérrez, C.; Bardou, L. Wavefront Prediction Using Artificial Neural Networks for Open-Loop Adaptive Optics. Mon. Not. R. Astron. Soc. 2020, 496, 456–464. [Google Scholar] [CrossRef]
Chen, H.; Wei, L.; He, Y.; Yang, J.; Li, X.; Li, L.; Huang, L.; Wei, K. Deep Learning Assisted Plenoptic Wavefront Sensor for Direct Wavefront Detection. Opt. Express 2023, 31, 2989–3004. [Google Scholar] [CrossRef]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention Is All You Need. arXiv 2017, arXiv:1706.03762. [Google Scholar]
Liu, Y.; Shao, Z.; Hoffmann, N. Global Attention Mechanism: Retain Information to Enhance Channel-Spatial Interactions. arXiv 2021, arXiv:2112.05561. [Google Scholar]
Feng, J.-H.; Hu, Q.-L.; Jiang, L.; Yang, Y.-Y.; Hua, S.-X.; Zhang, Y.-N.; Chen, L.-J. High-Precision Turbulence Wavefront Reconstruction Based on Transformer Structure. Chin. J. Liq. Cryst. Disp. 2023, 38, 798–808. [Google Scholar] [CrossRef]
Zhang, Y.; Li, S.A.; Wang, X.; Ren, Y.; Geng, Z.; Yang, F.; Pan, Z.; Yue, Y. Wedge Angle and Orientation Recognition of Multi-Opening Objects Using an Attention-Based CNN Model. Opt. Express 2024, 32, 30653–30669. [Google Scholar] [CrossRef]
Zhang, Z.; Hu, J.; Wang, R.; Chen, X.; Yang, D.; Vavilov, V.P.; Duan, Y.; Zhang, H. Automatic Segmentation of Microporous Defects in Composite Film Materials Based on the Improved Attention U-Net Module. Quant. InfraRed Thermogr. J. 2024, 31, 2989–3004. [Google Scholar] [CrossRef]
Zhang, Y.; Zhao, W.; Xu, T.; Ren, Y.; Zhang, R.; Pan, Z.; Yue, Y. Topological Charge Identification of Superimposed Orbital Angular Momentum Beams under Turbulence Using an Attention Mechanism. Opt. Express 2024, 32, 1941–1955. [Google Scholar] [CrossRef]
Kazemzadeh, M.; Collard, L.; Pisano, F.; Piscopo, L.; Ciraci, C.; De Vittorio, M.; Pisanello, F. Unwrapping Non-Locality in the Image Transmission through Turbid Media. Opt. Express 2024, 32, 26414–26433. [Google Scholar] [CrossRef]
Walsh, J.L. A Closed Set of Normal Orthogonal Functions. Am. J. Math. 1923, 45, 5–24. [Google Scholar] [CrossRef]
Pang, B.Q. Wavefront Sensing Based on Mode Modulation and Filter. Ph.D. Thesis, University of Chinese Academy of Sciences (Institute of Optoelectronics, Chinese Academy of Sciences), Chengdu, China, 2018. [Google Scholar]
Zhang, J.; Zhang, F.; Wu, Y. Methods for Simulating Turbulent Phase Screen. High Power Laser Part. Beams 2012, 24, 2318–2325. [Google Scholar] [CrossRef]
Dai, Z.; Liu, H.; Le, Q.V.; Tan, M. CoAtNet: Marrying Convolution and Attention for All Data Sizes. Adv. Neural Inf. Process. Syst. 2021, 34, 3965–3977. [Google Scholar] [CrossRef]
Mehta, S.; Rastegari, M. MobileViT: Light-Weight, General-Purpose, and Mobile-Friendly Vision Transformer. arXiv 2021, arXiv:2110.02178. [Google Scholar]
Liu, Y.; Shao, Z.; Teng, Y.; Hoffmann, N. NAM: Normalization-based Attention Module. arXiv 2021, arXiv:2111.12419. [Google Scholar]

Figure 1. Schematic of Walsh function phase modulation (black: +1, white: −1).

Figure 2. Walsh modulation far-field simulation results of a spherical aberration.

Figure 3. Experimental flowchart.

Figure 4. Overview of the CoAtNet.

Figure 5. Overview of the MobileViT.

Figure 6. Training loss curve.

Figure 7. First 36 Zernike coefficients: NAM-CoAtNet predictions vs. reference values.

Figure 8. Diffraction far-field comparison of NAM-CoAtNet. (a) Reference far-field image. (b) NAM-CoAtNet diffraction far-field. (c) Reference complex conjugate far-field image. (d) NAM-CoAtNet complex conjugate diffraction far-field.

Figure 9. Experimental results of center-offset images. (a) Focal plane image with center offset (left), predicted image (right); (b) first 36 Zernike coefficients: NAM-CoAtNet predictions vs. reference values.

Figure 10. (a) Schematic diagram of the experimental setup; (b) the experimental setup.

Figure 11. Schematic diagram of experimental results. (a) Light field detected by Shack–Hartmann sensor after Walsh modulation; (b) focal-plane far-field image.

Table 1. Zernike polynomials and their physical significance.

Polynomials	Physical Significance
$Z_{1} = 1$	Piston
$Z_{2} = 2 r \cos (θ)$	Tilt x
$Z_{3} = 2 r \sin (θ)$	Tilt y
$Z_{4} = \sqrt{3} (2 r^{2} - 1)$	Defocus
$Z_{5} = \sqrt{6} r^{2} \sin (2 θ)$	Astigmatism x
$Z_{6} = \sqrt{6} r^{2} \cos (2 θ)$	Astigmatism y
$Z_{11} = \sqrt{5} (6 r^{4} - 6 r^{2} + 1)$	Primary Spherical Aberration

Table 2. Number of parameters in the network.

Models	Parameters
Resnet34	21,296,868
MobileViT	14,960,420
CoAtNet	17,061,304
NAM-CoAtNet	17,062,072

Table 3. RMSE of Zernike coefficients predicted by different networks on the validation set.

Walsh Faction	ResNet34	MobileViT	CoAtNet	NAM-CoAtNet
W4	0.0116	0.0167	0.0128	0.0139
W5	0.0120	0.0186	0.0147	0.0108
W12	0.0117	0.0120	0.0101	0.0081

Table 4. NWE values obtained from inferences of different models.

Walsh Faction	ResNet34	MobileViT	CoAtNet	NAM-CoAtNet
W4	12.5%	11.7%	10.6%	10.1%
W5	10.6%	13.2%	9.1%	8.6%
W12	9.8%	10.5%	7.2%	5.3%

Table 5. NWE of complex conjugate wavefront predicted by models post-W12 modulation.

Models	$φ (x, y)$	$- φ (- x, - y)$
Resnet34	9.8%	9.5%
MobileViT	10.5%	11.4%
CoAtNet	7.2%	7.1%
NAM-CoAtNet	5.4%	6.3%

Table 6. Wavefront NWE values obtained from different models.

Models	NWE
Resnet34	13.4%
MobileViT	15.6%
CoAtNet	9.2%
NAM-CoAtNet	7.7%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Kou, H.; Gu, J.; You, J.; Wan, M.; Ye, Z.; Xiang, Z.; Yue, X. Single-Shot Wavefront Sensing in Focal Plane Imaging Using Transformer Networks. Optics 2025, 6, 11. https://doi.org/10.3390/opt6010011

AMA Style

Kou H, Gu J, You J, Wan M, Ye Z, Xiang Z, Yue X. Single-Shot Wavefront Sensing in Focal Plane Imaging Using Transformer Networks. Optics. 2025; 6(1):11. https://doi.org/10.3390/opt6010011

Chicago/Turabian Style

Kou, Hangning, Jingliang Gu, Jiang You, Min Wan, Zixun Ye, Zhengjiao Xiang, and Xian Yue. 2025. "Single-Shot Wavefront Sensing in Focal Plane Imaging Using Transformer Networks" Optics 6, no. 1: 11. https://doi.org/10.3390/opt6010011

APA Style

Kou, H., Gu, J., You, J., Wan, M., Ye, Z., Xiang, Z., & Yue, X. (2025). Single-Shot Wavefront Sensing in Focal Plane Imaging Using Transformer Networks. Optics, 6(1), 11. https://doi.org/10.3390/opt6010011

Article Menu

Single-Shot Wavefront Sensing in Focal Plane Imaging Using Transformer Networks

Abstract

1. Introduction

2. Basic Principles

2.1. Generation of Multiple-Solution Problems

2.2. Walsh Function Modulation

2.3. Zernike Polynomial Method for Simulating Phase Screens

3. Development of Neural Network Systems

4. Experiments and Analysis

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI