**1. Introduction**

Retroreflectors are passive devices that return the incident signal through the same propagation path. For our intended application on UAVs, a retroreflector is ideal, due to its size and zero power consumption. The fact that this is a dynamic link (with changing distance between the transmitter and the point where the signal is reflected occurring quickly or discretely) means that signal degradation is expected due to atmospheric turbulence induced effects, but also due to the general nature of a propagating beam. In order to ameliorate these effects, we relied on low order adaptive optics correction, in this case, focus control. Due to the constraints in SWaP, we have designed and fabricated an adaptive retroreflector which allows us to change the divergence of the beam in order to optimize the link, achieving higher link performance or longer distances than can normally be obtained with a passive system. This device enables the control of the divergence, which can be used to optimize the return signal in a monostatic configuration or to increase the return footprint of the beam in a bi-static, dynamic, or reconfigurable link (moving link), this latter case being the motivation for the following types of devices [1,2].

Adaptive optical devices (also known as active or tunable devices) are devices that can adjust their surface/curvature (such as deformable mirrors, fluidic lenses, elastic/elastomeric solids) or modify their index of refractions (such as liquid crystals) in order to change the optical properties of the element, such as its focal length. This leads to the design and

**Citation:** Santiago, F.; Font, C.O.; Restaino, S.R.; Qadri, S.N.; Bagwell, B.E. Design, Fabrication and Characterization of an Adaptive Retroreflector (AR). *Photonics* **2022**, *9*, 124. https://doi.org/10.3390/ photonics9030124

Received: 25 January 2022 Accepted: 15 February 2022 Published: 22 February 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

fabrication of an adaptive retroreflector—the one described in this paper is based on fluidic optical elements, for which NRL has extensive expertise [3–5]. based on fluidic optical elements, for which NRL has extensive expertise [3–5]. 2. AR Design and Configurations

design and fabrication of an adaptive retroreflector—the one described in this paper is

Photonics 2022, 8, x FOR PEER REVIEW 2 of 10

#### **2. AR Design and Configurations** Our adaptive retroreflector (AR) can be used with a corner cube retroreflector, solid

Our adaptive retroreflector (AR) can be used with a corner cube retroreflector, solid or hollow, and consists of an optical fluid, encapsulated by an elastomeric membrane that can be deformed via an actuator—in this case, the same actuator we use for our adaptive polymer lenses. This actuator and its electronics have been designed for tactical applications in which SWaP is key, making this ideal for UAV applications. or hollow, and consists of an optical fluid, encapsulated by an elastomeric membrane that can be deformed via an actuator—in this case, the same actuator we use for our adaptive polymer lenses. This actuator and its electronics have been designed for tactical applications in which SWaP is key, making this ideal for UAV applications. Depending on the application, both the membrane and fluid can be replaced with an

Depending on the application, both the membrane and fluid can be replaced with an elastomeric optical polymer (which can be made from the same material as the optical membrane) that can be deformed mechanically to make the adaptive retroreflector. This device enables contro-l of the divergence. elastomeric optical polymer (which can be made from the same material as the optical membrane) that can be deformed mechanically to make the adaptive retroreflector. This device enables contro-l of the divergence. The device can be fabricated in two ways: (1) using an elastomeric optical polymer,

The device can be fabricated in two ways: (1) using an elastomeric optical polymer, or (2) a fluidic adaptive/tunable device with a hollow or solid retroreflector. While manufacturing errors could change the operation of a retroreflector with an elastic polymer in front, by slightly changing the direction of return light, such errors can be easily quantified and corrected, for example, by monitoring the overall optical performance with an interferometer. or (2) a fluidic adaptive/tunable device with a hollow or solid retroreflector. While manufacturing errors could change the operation of a retroreflector with an elastic polymer in front, by slightly changing the direction of return light, such errors can be easily quantified and corrected, for example, by monitoring the overall optical performance with an interferometer.

For the elastomeric optical polymer option, the elastic polymer is molded to a desired initial shape and the change on the polymer surface can be affected by means of applying a pressure/compression to the polymer. An elastomeric optical polymer is a polymer that has high transmission at the user-desired operational wavelength and has elastic properties which allow the solid substrate to be deformed. A second alternative to deform the polymer can be achieved by the use of dielectric elastomer actuation in which a voltage is applied to a pliable electrode and the polymer is deformed, creating the change on its surface. Figure 1 shows the configuration of the elastomer optical polymer, as well as three operational states of the adaptive retroreflector. For the elastomeric optical polymer option, the elastic polymer is molded to a desired initial shape and the change on the polymer surface can be affected by means of applying a pressure/compression to the polymer. An elastomeric optical polymer is a polymer that has high transmission at the user-desired operational wavelength and has elastic properties which allow the solid substrate to be deformed. A second alternative to deform the polymer can be achieved by the use of dielectric elastomer actuation in which a voltage is applied to a pliable electrode and the polymer is deformed, creating the change on its surface. Figure 1 shows the configuration of the elastomer optical polymer, as well as three operational states of the adaptive retroreflector.

Figure 1. Schematic of an adaptive retroreflector using a solid elastomeric polymer: (a) flat (b) convex and (c) concave. **Figure 1.** Schematic of an adaptive retroreflector using a solid elastomeric polymer: (**a**) flat (**b**) convex and (**c**) concave.

For a fluidic adaptive/tunable option, an elastomeric membrane encapsulates an optical fluid which is mounted on the front of a hollow or solid retroreflector. The elastomeric membrane needs to have similar optical and mechanical properties to those described above for the solid option. The optical fluid, needs to be optically and chemically compatible with the membrane and needs to have high transmission at the operational wavelength. Polydimethylsiloxane is a common polymer that can be used for the membrane as well as for the elastomeric solid option. For the optical fluids, there are numerous oils, polymers and resins that have been studied (for example, water, glycerol, etc.) [3]. The actuation of this system can be achieved by compressing/decompressing the flexible membrane, which creates a change on its surface. This occurs by moving a cylinder along the optical axis of the system, thus compressing the circumference of the flexible membrane. Besides the mechanical action, magnetic actuation or use of a compliant electrode For a fluidic adaptive/tunable option, an elastomeric membrane encapsulates an optical fluid which is mounted on the front of a hollow or solid retroreflector. The elastomeric membrane needs to have similar optical and mechanical properties to those described above for the solid option. The optical fluid, needs to be optically and chemically compatible with the membrane and needs to have high transmission at the operational wavelength. Polydimethylsiloxane is a common polymer that can be used for the membrane as well as for the elastomeric solid option. For the optical fluids, there are numerous oils, polymers and resins that have been studied (for example, water, glycerol, etc.) [3]. The actuation of this system can be achieved by compressing/decompressing the flexible membrane, which creates a change on its surface. This occurs by moving a cylinder along the optical axis of the system, thus compressing the circumference of the flexible membrane. Besides the mechanical action, magnetic actuation or use of a compliant electrode (dielectric elastomer) can achieve actuation of the membrane. There are other actuation techniques that are situatable and could be implemented as well, such as those used by commercially available fluidic lenses,

for example, Varioptics, Optotune and Holochip [6–12]. Figure 2, shows a conceptual sketch of the adaptive retroreflector based on the flexible membrane/fluidic concept. commercially available fluidic lenses, for example, Varioptics, Optotune and Holochip [6– 12]. Figure 2, shows a conceptual sketch of the adaptive retroreflector based on the flexible membrane/fluidic concept.

(dielectric elastomer) can achieve actuation of the membrane. There are other actuation techniques that are situatable and could be implemented as well, such as those used by

Photonics 2022, 8, x FOR PEER REVIEW 3 of 10

Figure 2. Schematic of an adaptive retroreflector using an optical liquid/fluidic: (a) flat (b) convex and (c) concave. **Figure 2.** Schematic of an adaptive retroreflector using an optical liquid/fluidic: (**a**) flat (**b**) convex and (**c**) concave.

The latter configuration was selected for this paper. The device changes the divergence of the returned beam but can also work as a regular passive retroreflector if the The latter configuration was selected for this paper. The device changes the divergence of the returned beam but can also work as a regular passive retroreflector if the system requires.

#### system requires. **3. Fabrication, Characterization and Results**

3. Fabrication, Characterization and Results Here, we present the optical design of the AR as fabricated and in the configuration we used for testing. We describe the optical setups used for testing and comparing the device to a passive retroreflector. Data for the repeatability and arbitrary radius of curvature measurements was acquired using a Zygo Verifire HD optical interferometer. Furthermore, we show images taken comparing a passive retroreflector in comparison with Here, we present the optical design of the AR as fabricated and in the configuration we used for testing. We describe the optical setups used for testing and comparing the device to a passive retroreflector. Data for the repeatability and arbitrary radius of curvature measurements was acquired using a Zygo Verifire HD optical interferometer. Furthermore, we show images taken comparing a passive retroreflector in comparison with AR, both actuated and flat.

#### AR, both actuated and flat. *3.1. AR Optical Design*

3.1. AR Optical Design We used OpticStudio nonsequential tools to model the corner cube retroreflector and adaptive components and, for visualization purposes, a beam splitter cube was added, as shown in Figure 3. This was also the configuration chosen for the test. Note that we did not model the thickness of the membrane, since the effects of the membrane are negligible in OpticStudio for this type of application. The model was performed using the volume of the fluid, the fluid acting as a lens which changes its radius of curvature and center thickness. The figure shows a collimated beam, incident on a beam splitter cube which reflects part of the incident light and transmits a portion, which then impinges on the adaptive retroreflector. Light is reflected back from the retroreflector and reflected again from the beam splitter cube and incident on the detector. In field operations, the beam splitter can be used to monitor the incoming beam and direct the adaptive retroreflector in order to control the divergence. It can also be used without the beam splitter cube, such that the beam can be monitored at the receiver side and the system optimized in a power-We used OpticStudio nonsequential tools to model the corner cube retroreflector and adaptive components and, for visualization purposes, a beam splitter cube was added, as shown in Figure 3. This was also the configuration chosen for the test. Note that we did not model the thickness of the membrane, since the effects of the membrane are negligible in OpticStudio for this type of application. The model was performed using the volume of the fluid, the fluid acting as a lens which changes its radius of curvature and center thickness. The figure shows a collimated beam, incident on a beam splitter cube which reflects part of the incident light and transmits a portion, which then impinges on the adaptive retroreflector. Light is reflected back from the retroreflector and reflected again from the beam splitter cube and incident on the detector. In field operations, the beam splitter can be used to monitor the incoming beam and direct the adaptive retroreflector in order to control the divergence. It can also be used without the beam splitter cube, such that the beam can be monitored at the receiver side and the system optimized in a power-in-the-bucket (PIB) configuration using well-known algorithms (e.g., stochastic parallel-gradient-descent) [13,14]; the AR is then instructed by this information.

#### gradient-descent) [13,14]; the AR is then instructed by this information. *3.2. Fabrication of the AR*

For this particular design we used a 12.7 mm corner cube retroreflector from Thorlabs, polydimethysiloxane (PDMS) as the membrane, glass support structures for the PDMS, and an optical fluid with an index of refraction of 1.45 and an Abbe number of 45.0 (at λ = 589 nm). The first step consisted in making the PDMS membrane which was then bonded to the glass support structure. The fluid was added to the membrane/glass structure and the corner cube was bonded to it. The last step was to mount the AR into the actuator and start the testing—the assembly steps are shown in Figure 4. An important note: for this proof of concept, we did not follow the special fabrication procedures that we normally utilize to reduce the surface wavefront error which involve the reduction of coma induced by gravity and astigmatism resulting from the materials and fabrication

in-the-bucket (PIB) configuration using well-known algorithms (e.g., stochastic parallel-

procedures. Figure 4 shows the top-level schematic of the assembly procedure, as well as pictures of the assembled AR (in its actuator). The actuator was custom-made for our adaptive lenses, and we were able to modify one to accommodate the AR. The actuator consists of a modified motor in a custom housing, with a maximum clear aperture of 19.5 mm, optical encoder with a resolution of ~50 nm, speed of ~2.5 mm/s, peak power consumption ~15 W, idle power consumption of ~1–500 µW and temperature monitoring of 0.01 ◦C. The electronics can control two actuators at the same time and can run off three CR-123 batteries (two batteries for a single actuator) with an average number of actuations of about 6000 per set of batteries. Photonics 2022, 8, x FOR PEER REVIEW 4 of 10 Figure 3. Optical design setup used for testing including the collimated source incident on the beam splitter cube with the outputs: (a) a collimated beam, (b) a converging beam, and (c) a divergent beam. 3.2. Fabrication of the AR For this particular design we used a 12.7 mm corner cube retroreflector from Thorlabs, polydimethysiloxane (PDMS) as the membrane, glass support structures for the PDMS, and an optical fluid with an index of refraction of 1.45 and an Abbe number of 45.0 (at λ = 589 nm). The first step consisted in making the PDMS membrane which was then bonded to the glass support structure. The fluid was added to the membrane/glass struc-

Photonics 2022, 8, x FOR PEER REVIEW 4 of 10

Figure 3. Optical design setup used for testing including the collimated source incident on the beam splitter cube with the outputs: (a) a collimated beam, (b) a converging beam, and (c) a divergent **Figure 3.** Optical design setup used for testing including the collimated source incident on the beam splitter cube with the outputs: (**a**) a collimated beam, (**b**) a converging beam, and (**c**) a divergent beam. 0.01 °C. The electronics can control two actuators at the same time and can run off three CR-123 batteries (two batteries for a single actuator) with an average number of actuations of about 6000 per set of batteries.

procedures. Figure 4 shows the top-level schematic of the assembly procedure, as well as pictures of the assembled AR (in its actuator). The actuator was custom-made for our Figure 4. Schematic representation of the fabrication and assembly process of the AR. **Figure 4.** Schematic representation of the fabrication and assembly process of the AR.

#### adaptive lenses, and we were able to modify one to accommodate the AR. The actuator 3.3. Laboratory Optical Setup *3.3. Laboratory Optical Setup*

3.3. Laboratory Optical Setup

beam.

consists of a modified motor in a custom housing, with a maximum clear aperture of 19.5 mm, optical encoder with a resolution of ~50 nm, speed of ~2.5 mm/s, peak power consumption ~15 W, idle power consumption of ~1–500 µW and temperature monitoring of 0.01 °C. The electronics can control two actuators at the same time and can run off three CR-123 batteries (two batteries for a single actuator) with an average number of actuations of about 6000 per set of batteries. The AR was tested in various ways. Firstly, in the same way that we measured the radius of curvature (ROC) of lenses using a Zygo interferometer—for this device we measured two ROCs as a test, a positive (convex surface) with a ROC of 234 mm and a negative The AR was tested in various ways. Firstly, in the same way that we measured the radius of curvature (ROC) of lenses using a Zygo interferometer—for this device we measured two ROCs as a test, a positive (convex surface) with a ROC of 234 mm and a negative (concave surface) of −185 mm. The second setup was to compare the performance and proof of concept of the AR in comparison with a passive retroreflector, as shown in Figure 5. We used the HeNe 633 nm source of the Zygo interferometer and a 1550 nm was coaligned for further testing. Data from the 1550 nm was not included but performance of the active surface component at this wavelength has been demonstrated in a previous report [5]. We were able to use the beam collimated or with the addition of a known divergence that could be removed with the AR and compared with the passive retroreflector. The setup with the beam splitter cube allowed us to look at the return beam with the interferometer and, on the other arm, to look at the output with a camera, photodetector or power meter, while we were able to use beam blocks to look at each retroreflector individually or additionally, enabling viewing of the interference fringes formed by the two. This facilitated alignment, but also monitoring of the difference when the AR is actuated. This same setup was also

Figure 4. Schematic representation of the fabrication and assembly process of the AR.

The AR was tested in various ways. Firstly, in the same way that we measured the radius of curvature (ROC) of lenses using a Zygo interferometer—for this device we measured two ROCs as a test, a positive (convex surface) with a ROC of 234 mm and a negative used to perform first order measurements of the repeatability of the AR by actuating the AR from a flat state to a convex or concave state and back to a flat state, while measuring the surface form with the interferometer. An important note: temperature was monitored in the room, but temperature compensation of the AR was not used—the room environment was stable and thus compensation was not required. This same setup was also used to perform first order measurements of the repeatability of the AR by actuating the AR from a flat state to a convex or concave state and back to a flat state, while measuring the surface form with the interferometer. An important note: temperature was monitored in the room, but temperature compensation of the AR was not used—the room environment was stable and thus compensation was not required.

(concave surface) of −185 mm. The second setup was to compare the performance and proof of concept of the AR in comparison with a passive retroreflector, as shown in Figure 5. We used the HeNe 633 nm source of the Zygo interferometer and a 1550 nm was coaligned for further testing. Data from the 1550 nm was not included but performance of the active surface component at this wavelength has been demonstrated in a previous report [5]. We were able to use the beam collimated or with the addition of a known divergence that could be removed with the AR and compared with the passive retroreflector. The setup with the beam splitter cube allowed us to look at the return beam with the interferometer and, on the other arm, to look at the output with a camera, photodetector or power meter, while we were able to use beam blocks to look at each retroreflector individually or additionally, enabling viewing of the interference fringes formed by the two. This facilitated alignment, but also monitoring of the difference when the AR is actuated.

Photonics 2022, 8, x FOR PEER REVIEW 5 of 10

Figure 5. (Left) Layout of the optical testing setup. (Right) Picture of optical devices used for measurements. **Figure 5.** (**Left**) Layout of the optical testing setup. (**Right**) Picture of optical devices used for measurements.

#### 3.4. Results/Discussion *3.4. Results/Discussion*

#### 3.4.1. ROC and Surface Measurements 3.4.1. ROC and Surface Measurements

The first test consisted of measurements of the ROC, positive and negative, in order to evaluate the performance of the device. Figure 6 shows a set of measurements, including a (left) measurement for a positive ROC of 255 mm and a (right) measurement for a negative ROC of 184 mm. The top row shows the 3D surface profile, and the bottom row indicates the 2D profile. The black circles in the figure are software masks used to remove unnecessary back reflections created by dust particles in the reference sphere. Within the respective figures, the left column (A or C) is the raw measurement and the right side (B or D) is with the dominant aberrations removed. As mentioned before, fabrication was not optimized for the surface figure, but what can be seen is the typical dominant aberration of coma and astigmatism, which are characteristic for this type of fluidic structures. Coma is due to gravity and astigmatism is due to fabrication or assembly procedures. For the positive ROC case demonstrated below, coma is the dominant aberration. On the negative ROC, there is a combination of coma and astigmatism, because measurements were taken close to the negative resting ROC (fabricated ROC) of the membrane for the fabricated AR device. The fabricated aberrations were more noticeable closer to the resting ROC because, for this type of actuation mechanism, this is the point of contact where boundary conditions are established between the membrane and actuation surface for the clear aperture. At this point, the amplitude of any existing aberrations can be enhanced. Another aberration that can be noticed is trefoil on both ROCs—this was purely due to the assembly in the actuator. We developed procedures for fabrication and assembly that reduce the dominant aberrations which are implemented when building adaptive polymer lenses, with the caveat that we can minimize coma based on the application, but do The first test consisted of measurements of the ROC, positive and negative, in order to evaluate the performance of the device. Figure 6 shows a set of measurements, including a (left) measurement for a positive ROC of 255 mm and a (right) measurement for a negative ROC of 184 mm. The top row shows the 3D surface profile, and the bottom row indicates the 2D profile. The black circles in the figure are software masks used to remove unnecessary back reflections created by dust particles in the reference sphere. Within the respective figures, the left column (A or C) is the raw measurement and the right side (B or D) is with the dominant aberrations removed. As mentioned before, fabrication was not optimized for the surface figure, but what can be seen is the typical dominant aberration of coma and astigmatism, which are characteristic for this type of fluidic structures. Coma is due to gravity and astigmatism is due to fabrication or assembly procedures. For the positive ROC case demonstrated below, coma is the dominant aberration. On the negative ROC, there is a combination of coma and astigmatism, because measurements were taken close to the negative resting ROC (fabricated ROC) of the membrane for the fabricated AR device. The fabricated aberrations were more noticeable closer to the resting ROC because, for this type of actuation mechanism, this is the point of contact where boundary conditions are established between the membrane and actuation surface for the clear aperture. At this point, the amplitude of any existing aberrations can be enhanced. Another aberration that can be noticed is trefoil on both ROCs—this was purely due to the assembly in the actuator. We developed procedures for fabrication and assembly that reduce the dominant aberrations which are implemented when building adaptive polymer lenses, with the caveat that we can minimize coma based on the application, but do not completely eliminate it. The procedure to eliminate coma during fabrication is extremely complex, costly and time consuming if performed at the active surface. There are other ways to minimize it, including using a corrective element along the optical path of the system or close to the active surface, and this is a typical configuration used in commercial adaptive/tunable lenses [11].

mercial adaptive/tunable lenses [11].

Figure 6. (Left) Positive ROC measurement (A,B columns) 2D and 3D profiles with dominant aberration removed, in this instance being coma (B column). (Right) Negative ROC measurement (C,D columns) 2D and 3D profiles with dominant aberration removed, in this instance, coma and astigmatism (D column). **Figure 6.** (**Left**) Positive ROC measurement (**A**,**B** columns) 2D and 3D profiles with dominant aberration removed, in this instance being coma (**B** column). (**Right**) Negative ROC measurement (**C**,**D** columns) 2D and 3D profiles with dominant aberration removed, in this instance, coma and astigmatism (**D** column).

not completely eliminate it. The procedure to eliminate coma during fabrication is extremely complex, costly and time consuming if performed at the active surface. There are other ways to minimize it, including using a corrective element along the optical path of the system or close to the active surface, and this is a typical configuration used in com-

Using the data from Figure 6, a Zernike fit was performed using the Mx software tools from the interferometer and coefficients of the fit for both the positive and negative ROCs are shown in Table 1. Using the data from Figure 6, a Zernike fit was performed using the Mx software tools from the interferometer and coefficients of the fit for both the positive and negative ROCs are shown in Table 1.

Table 1. Results from the Zernike fit coefficients obtained from the data in Figure 6, for positive ROC (top) and negative ROC (bottom). **Table 1.** Results from the Zernike fit coefficients obtained from the data in Figure 6, for positive ROC (top) and negative ROC (bottom).


Figure 7, shows data taken for the AR at the same ROCs mentioned above but in a perpendicular configuration in order to eliminate the effects of coma due to gravity. Note, that for the data no terms have been removed. Astigmatism and trefoil were noticeable but the large magnitude due to coma was absent.

but the large magnitude due to coma was absent.

Figure 7. (A) Positive and (B) negative ROC, 2D and 3D surface representation for the perpendicular setup. Right side shows a picture of the setup. **Figure 7.** (**A**) Positive and (**B**) negative ROC, 2D and 3D surface representation for the perpendicular setup. Right side shows a picture of the setup.

Figure 7, shows data taken for the AR at the same ROCs mentioned above but in a perpendicular configuration in order to eliminate the effects of coma due to gravity. Note, that for the data no terms have been removed. Astigmatism and trefoil were noticeable

The same procedure was performed on the results from Figure 7 and Table 2 shows the Zernike fit coefficients for the perpendicular measurements. The same procedure was performed on the results from Figure 7 and Table 2 shows the Zernike fit coefficients for the perpendicular measurements.

Table 2. Results from the Zernike fit coefficients obtained from the data in Figure 7, for positive ROC (top) and negative ROC (bottom). **Table 2.** Results from the Zernike fit coefficients obtained from the data in Figure 7, for positive ROC (top) and negative ROC (bottom).


#### 3.4.2. Repeatability Measurements Repeatability measurements were taken using the setup in Figure 5. The data collec-3.4.2. Repeatability Measurements

tion consisted in changing the actuation state by a known encoder count from a flat state to a convex/concave state, while recording the encoder position as well as the data from Repeatability measurements were taken using the setup in Figure 5. The data collection consisted in changing the actuation state by a known encoder count from a flat state to a convex/concave state, while recording the encoder position as well as the data from the inferferometer. The encoder data is in the form of a set of three numbers: the set position by user (state of the lens), the temperature compensate position (once thermal compensation is activated) and the measured position. This last position, or the difference from the set

position, was recorded. Readings from the interferometer PV(λ) (peak to valley) and power(λ) were recorded as well. Data was taken for a delta for encoder counts of 150 and 300 from flat, on both positive (convex or higher encoder counts), and negative (concave or lower encoder counts) direction. Figure 8 shows a sequence of consecutive measurements from flat to positive and Figure 9 shows measurements from flat to negative for a delta of 150 encoder counts. Note, the hexagonal pattern was a result of the facets of the corner cube. This was noticeable in this configuration based on the testing setup with the interferometer using a transmission flat. For the ROCs the measurements differed, since we were using a reference sphere and the spherical wavefront matched the deformed membrane, not the retroreflector. ley) and power(λ) were recorded as well. Data was taken for a delta for encoder counts of 150 and 300 from flat, on both positive (convex or higher encoder counts), and negative (concave or lower encoder counts) direction. Figure 8 shows a sequence of consecutive measurements from flat to positive and Figure 9 shows measurements from flat to negative for a delta of 150 encoder counts. Note, the hexagonal pattern was a result of the facets of the corner cube. This was noticeable in this configuration based on the testing setup with the interferometer using a transmission flat. For the ROCs the measurements differed, since we were using a reference sphere and the spherical wavefront matched the deformed membrane, not the retroreflector. from the set position, was recorded. Readings from the interferometer PV(λ) (peak to valley) and power(λ) were recorded as well. Data was taken for a delta for encoder counts of 150 and 300 from flat, on both positive (convex or higher encoder counts), and negative (concave or lower encoder counts) direction. Figure 8 shows a sequence of consecutive measurements from flat to positive and Figure 9 shows measurements from flat to negative for a delta of 150 encoder counts. Note, the hexagonal pattern was a result of the facets of the corner cube. This was noticeable in this configuration based on the testing setup with the interferometer using a transmission flat. For the ROCs the measurements differed, since we were using a reference sphere and the spherical wavefront matched the

the inferferometer. The encoder data is in the form of a set of three numbers: the set position by user (state of the lens), the temperature compensate position (once thermal compensation is activated) and the measured position. This last position, or the difference from the set position, was recorded. Readings from the interferometer PV(λ) (peak to val-

the inferferometer. The encoder data is in the form of a set of three numbers: the set position by user (state of the lens), the temperature compensate position (once thermal compensation is activated) and the measured position. This last position, or the difference

Photonics 2022, 8, x FOR PEER REVIEW 8 of 10

Photonics 2022, 8, x FOR PEER REVIEW 8 of 10

deformed membrane, not the retroreflector.

Figure 8. The AR was actuated from the flat state to a compressed state (or convex surface) and back to flat. For each case the top row is the 3D surface and the bottom row the 2D surface. **Figure 8.** The AR was actuated from the flat state to a compressed state (or convex surface) and back to flat. For each case the top row is the 3D surface and the bottom row the 2D surface. Figure 8. The AR was actuated from the flat state to a compressed state (or convex surface) and back to flat. For each case the top row is the 3D surface and the bottom row the 2D surface.

back to flat. For each case the top row is the 3D surface and the bottom row the 2D surface. Figure 9. The AR was actuated from the flat state to a less compressed state (or concave surface) and back to flat. For each case the top row is the 3D surface and the bottom row the 2D surface. **Figure 9.** The AR was actuated from the flat state to a less compressed state (or concave surface) and back to flat. For each case the top row is the 3D surface and the bottom row the 2D surface.

In Figure 10, data is presented in graphical (with error bars based on the standard

deviation), and tabular, form for the sequences, with 20 data points for delta 150 and 10 points for delta 300, and all cases starting from the same initial flat position. The average and standard deviation for the encoder position and peak-to-valley for the cases are shown in the table. In Figure 10, data is presented in graphical (with error bars based on the standard deviation), and tabular, form for the sequences, with 20 data points for delta 150 and 10 points for delta 300, and all cases starting from the same initial flat position. The average and standard deviation for the encoder position and peak-to-valley for the cases are shown in the table. In Figure 10, data is presented in graphical (with error bars based on the standard deviation), and tabular, form for the sequences, with 20 data points for delta 150 and 10 points for delta 300, and all cases starting from the same initial flat position. The average and standard deviation for the encoder position and peak-to-valley for the cases are shown in the table.

Figure 11 shows a comparison of the AR with a passive retroreflector. A screen was placed at a distance of approximately 1500 mm and the response from a collimated beam recorded and the AR was actuated in order to focus the beam on the screen.

Photonics 2022, 8, x FOR PEER REVIEW 9 of 10


Figure 10. Graphical and tabular representation of the actuation sequence for the cases described above. Delta values refer to the change in encoder position from the flat state. **Figure 10.** Graphical and tabular representation of the actuation sequence for the cases described above. Delta values refer to the change in encoder position from the flat state. placed at a distance of approximately 1500 mm and the response from a collimated beam recorded and the AR was actuated in order to focus the beam on the screen.

Figure 11. Images taken in the laboratory at a known distance from the retroreflector are shown (left) conventional retroreflector, (middle) AR in the flat state, and (right) both retroreflectors overlapping in the screen with the AR actuated to focus the beam at that particular distance. **Figure 11.** Images taken in the laboratory at a known distance from the retroreflector are shown (**left**) conventional retroreflector, (**middle**) AR in the flat state, and (**right**) both retroreflectors overlapping in the screen with the AR actuated to focus the beam at that particular distance.

#### Figure 11. Images taken in the laboratory at a known distance from the retroreflector are shown 4. Conclusions **4. Conclusions**

(left) conventional retroreflector, (middle) AR in the flat state, and (right) both retroreflectors overlapping in the screen with the AR actuated to focus the beam at that particular distance. 4. Conclusions We have presented the concept of an adaptive retroreflector. This concept was developed during a data campaign to study the atmospheric turbulence in a dynamic link, with the end goal of an optical anemometer for UAV applications in which the propagation distance is changing rapidly. The concept of the AR was then designed, fabricated and tested in a laboratory environment as a proof of concept. This particular device can operate from the VIS to the SWIR and preliminary parameters of its performance were studied. The next step will consist of fabricating a device following the tighter tolerance procedures developed previously for adaptive lenses. A follow-up report will consist of performing a calibration in a laboratory environment, including thermal compensation and quantification of losses added by absorption and/or scattering due to the membrane/fluid combination in comparison with a conventional retroreflector. The latter case will be studied in more detail in a field experiment where we can compare the losses due to the addition of the membrane/fluid combination with the losses of a conventional retroreflector (e.g., due to diffraction, or divergence introduced by atmospheric turbulence) at a propagation path. We have presented the concept of an adaptive retroreflector. This concept was developed during a data campaign to study the atmospheric turbulence in a dynamic link, with the end goal of an optical anemometer for UAV applications in which the propagation distance is changing rapidly. The concept of the AR was then designed, fabricated and tested in a laboratory environment as a proof of concept. This particular device can operate from the VIS to the SWIR and preliminary parameters of its performance were studied. The next step will consist of fabricating a device following the tighter tolerance procedures developed previously for adaptive lenses. A follow-up report will consist of performing a calibration in a laboratory environment, including thermal compensation and quantification of losses added by absorption and/or scattering due to the membrane/fluid combination in comparison with a conventional retroreflector. The latter case will be studied in more detail in a field experiment where we can compare the losses due to the addition of the membrane/fluid combination with the losses of a conventional retroreflector (e.g., due to diffraction, or divergence introduced by atmospheric turbulence) at a propagation path. A power-in-the-bucket configuration will be used to compare the divergence control of the adaptive retroreflector and a conventional one. While the overall losses depend on We have presented the concept of an adaptive retroreflector. This concept was developed during a data campaign to study the atmospheric turbulence in a dynamic link, with the end goal of an optical anemometer for UAV applications in which the propagation distance is changing rapidly. The concept of the AR was then designed, fabricated and tested in a laboratory environment as a proof of concept. This particular device can operate from the VIS to the SWIR and preliminary parameters of its performance were studied. The next step will consist of fabricating a device following the tighter tolerance procedures developed previously for adaptive lenses. A follow-up report will consist of performing a calibration in a laboratory environment, including thermal compensation and quantification of losses added by absorption and/or scattering due to the membrane/fluid combination in comparison with a conventional retroreflector. The latter case will be studied in more detail in a field experiment where we can compare the losses due to the addition of the membrane/fluid combination with the losses of a conventional retroreflector (e.g., due to diffraction, or divergence introduced by atmospheric turbulence) at a propagation path. A power-in-the-bucket configuration will be used to compare the divergence control of the adaptive retroreflector and a conventional one. While the overall losses depend on configuration and materials, our experience with fluidic lenses has shown that the transmission losses are negligible compared to effects induced by turbulence.

A power-in-the-bucket configuration will be used to compare the divergence control of the adaptive retroreflector and a conventional one. While the overall losses depend on

#### **5. Patents**

A provisional patent application has been submitted, U.S. Patent Application Serial No. 62/695,310.

**Author Contributions:** Conceptualization, F.S. and C.O.F.; methodology, F.S., C.O.F., B.E.B. and S.R.R.; formal analysis, F.S., S.N.Q. and S.R.R.; writing—original draft preparation, F.S., C.O.F., S.R.R.; writing—review and editing, S.R.R., S.N.Q. and B.E.B.; funding acquisition, F.S. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Data is contained within the article.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


**Qi Hu , Yuyao Xiao, Jiahe Cui , Raphaël Turcotte and Martin J. Booth \***

Department of Engineering Science, University of Oxford, Oxford OX1 3PJ, UK **\*** Correspondence: martin.booth@eng.ox.ac.uk

**Abstract:** We show that there is an intrinsic link between the use of Walsh aberration modes in adaptive optics (AO) and the mathematics of lattices. The discrete and binary nature of these modes means that there are infinite combinations of Walsh mode coefficients that can optimally correct the same aberration. Finding such a correction is hence a poorly conditioned optimisation problem that can be difficult to solve. This can be mitigated by confining the AO correction space defined in Walsh mode coefficients to the fundamental Voronoi cell of a lattice. By restricting the correction space in this way, one can ensure there is only one set of Walsh coefficients that corresponds to the optimum correction aberration. This property is used to enable the design of efficient estimation algorithms to solve the inverse problem of finding correction aberrations from a sequence of measurements in a wavefront sensorless AO system. The benefit of this approach is illustrated using a neural-networkbased estimator.

**Keywords:** lattice geometry; Walsh functions; adaptive optics

## **1. Introduction**

Many adaptive optics (AO) methods have been developed to compensate phase aberrations in a range of applications including astronomy, ophthalmology and microscopy [1–3]. All AO systems are limited, in some way, by the capabilities of the adaptive element, typically a deformable mirror (DM) or a spatial light modulator (SLM), that corrects the aberrations. One such limitation is in the range of phase functions that the element can correct. The correction space of an AO element is defined by the range of phase functions that can be imparted by the device. For a pixelated AO device, such as a SLM or segmented DM, the correction space is defined by the set of accessible pixel states, which could be represented by the set of phase values for each pixel.

In many AO systems, it is preferable to design the system around a set of orthogonal modes for representation and control of the wavefront, rather than localized wavefront modulations. For example, wavefont-sensorless AO systems usually use a modal basis [4,5]. This method involves the sequential application of predetermined bias aberrations, the acquisition of a set of measurements of an appropriate quality metric, and then estimation of the required correction aberration. The conventional approach to sensorless AO is to use knowledge of the forward problem—that is how the quality metric is affected by input aberrations—to inform the design of an efficient estimation scheme that, in effect, solves the inverse problem of finding the optimal correction aberrations from the set of metric measurements. Such estimation can be performed using optimisation algorithms or neural networks (NN) to solve the inverse problem [6,7].

It is known that control using modes defined across the whole pupil provides stronger modulation of the optimisation metric than individual pixels or subregions of the pupil [5]. Such whole-pupil modulation hence provides more robust operation, particularly in lowlight level imaging scenarios when the signal-to-noise ratio (SNR) is low. For such pixelbased sensorless AO systems, Walsh modes are an appropriate choice. Walsh modes are a set of orthogonal functions that represent phase patterns across a pixelated pupil, where the number of pixels is equal to a power of 2 [8,9]. Each Walsh mode consists of an equal

**Citation:** Hu, Q.; Xiao, Y.; Cui, J.; Turcotte, R.; Booth, M.J. The Lattice Geometry of Walsh-Function-Based Adaptive Optics. *Photonics* **2022**, *9*, 547. https://doi.org/10.3390/ photonics9080547

Received: 10 June 2022 Accepted: 30 July 2022 Published: 4 August 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

number of pixels taking each of the binary values +1 or −1. For the different modes, a different combination of pixels takes the positive and negative values.

The binary nature of Walsh modes means that the range of each mode that must be searched to find the optimum correction is finite. This contrasts with other modal bases built upon continuous functions, such as Zernike polynomials, which would have unbounded range (albeit limited by the stroke of the adaptive correction element).

However, care needs to be taken when considering combinations of Walsh modes, as multiple combinations of modes can have the same effect on the system. This means that there are multiple potential solutions to the inverse problem of finding the optimal set of Walsh mode coefficients that optimise aberration correction. These multiple solutions can cause complications in defining an estimator to solve the inverse problem. Solving the inverse problem would be considerably simplified if we could ensure that there was only one optimal solution in the search space.

We show that there are properties of the Walsh modes that link the operation of these sensorless AO systems to the mathematics of lattices [10]. We discuss how these mathematical properties can aid the design of aberration estimation algorithms by constraining the search space. Specifically, we show heuristically that through understanding of the lattice geometry, we can define a unique finite search space, in terms of combinations of Walsh mode coefficients, that contains a single optimum correction. This permits the implementation of an efficient NN-based optimisation scheme that can measure and correct any combination of *N* Walsh modes of any coefficient value using only 2*N* + 1 metric measurements. We show that a simple NN can be trained to solve the inverse problem if the search space is constrained using the lattice model, whereas the correct combination of Walsh mode coefficients cannot reliably be found for a nonconstrained search space.

## **2. Optical System Model**

For the purposes of modelling, we considered the simple sensorless adaptive optics system shown in Figure 1. Such a model has been extensively used for analysis of such sensorless systems [11,12], as the principle of operation is readily extendable to similar optical systems, including applications in laser material processing, free-space communications, and laser scanning microscopy.

**Figure 1.** Optical system used for modelling. The input wavefront contains a phase aberration Φ, which passes through a correction device imparting an additional phase Ψ. The beam is focussed onto a vanishingly small pinhole detector on the optical axis that admits the intensity *I*.

The input beam to the system is collimated and has uniform amplitude. The input phase aberration is Φ(**r**) and the phase Ψ(**r**) is added by the adaptive element (AE), which could be a pixelated SLM or a segmented DM. These are both considered to be added at the pupil *P*, which is taken to have unit radius; **r** is the normalised coordinate vector in the pupil. The lens performs a Fourier transform of the pupil field to provide a focal field. A vanishingly small pinhole detector is placed on axis at the centre of the focus and detects a signal *I* that corresponds to the on-axis intensity at the focus. This is equivalent to the power of the

zero-frequency component of the Fourier transform, which is equivalent to the squared modulus of the mean value of the pupil field. Mathematically, this can be expressed as

$$I = \left| \frac{1}{\pi} \int\_{P} \exp i[\Phi(\mathbf{r}) + \Psi(\mathbf{r})] d^2 \mathbf{r} \right|^2 \tag{1}$$

where the maximum, aberration-free signal is normalised to 1.

#### **3. Representation of Aberrations as Walsh Modes**

For simplicity, let us assume Φ(**r**) = 0 so that all aberrations can be represented within Ψ(**r**). Let us also assume that the adaptive element is a pixelated device, where each of the N pixels can introduce a piston phase. The phase introduced by the adaptive element could be expressed as

$$\mathbf{\varUpsilon}(\mathbf{r}) = \sum\_{l=1}^{N} \alpha\_{l} \eta\_{l}(\mathbf{r}) \tag{2}$$

where *ηl*(**r**) are the phase influence functions of each pixel, which have value 1 within the pixel area and 0 elsewhere; *α<sup>l</sup>* are the coefficients of these influence functions, which correspond to the pixel phase value. Alternatively, we could represent the AE phase as

$$\Psi(\mathbf{r}) = \sum\_{k=0}^{N-1} \beta\_k \omega\_k(\mathbf{r}) \tag{3}$$

where *ωk*(**r**) are functions that take binary values of −1 or +1 in each pixel region, such that the *l*th pixel takes on the *l*th value of the *k*th Walsh function of length *N*, *W<sup>N</sup> k* [*l*] [8]. *β<sup>k</sup>* are the coefficients of these functions *ωk*(**r**). For a given sequence length *N* = 2 *γ* , where *γ* is an integer, there are N orthogonal Walsh functions, each of which consists of *N*/2 elements of value −1 or +1, except for the first function that consists entirely of 1 s (see examples in Figure 2). We follow the convention that the Walsh function index starts at *k* = 0. From the above relationships, it is clear that each pixel phase can be represented as

$$\alpha\_{l} = \sum\_{k=0}^{N-1} \beta\_{k} \mathsf{W}\_{k}^{N}[I] \tag{4}$$

or alternatively in matrix–vector format as

$$\mathbf{a} = \mathbf{W}^T \mathbf{b} \tag{5}$$

where **a** is a vector of length N that contains the phase value of each pixel, **b** is a vector of length N that contains the coefficient of each Walsh function and **W** is an *N* × *N* Walsh– Hadamard matrix consisting of values ±1 [13,14]. The rows of this matrix correspond to each of the Walsh functions. The matrix **W** provides the mapping between the Walsh coefficients and the pixel values. For Hadamard matrices, **WW***<sup>T</sup>* = *N***I***N*, where **I***<sup>N</sup>* is the identity matrix of size N [13,14]. Hence, we can invert Equation (5) as

$$\mathbf{b} = \frac{1}{N} \mathbf{W} \mathbf{a} \tag{6}$$

Note that for a set of Walsh functions to be defined, we require *N* = 2 *γ* , where *γ* is an integer. We assume throughout this paper therefore that *N* = 2 *γ* . However, Hadamard matrices also exist for *N* = 4*γ*, where *γ* is an integer [13,14]. For simplicity, these other matrices will not be considered in this current analysis.

**Figure 2.** Illustrations of Walsh modes. (**a**) The Walsh functions *W*<sup>8</sup> *k* [*l*] shown in numerical form. (**b**) The Walsh modes *W*<sup>16</sup> *k* [*l*] shown as aberration basis modes over a square aperture. (**c**) The polar Walsh modes, equivalent to *W*<sup>16</sup> *k* [*l*] shown as aberration basis modes over a circular aperture. In both (**b**) and (**c**), index *k* = 0 for the first mode is in the top left, and *k* increments in row-major order to *k* = 15 in the bottom right.

#### **4. Lattice Symmetry of Aberration States**

We can define the aberration state fully as the vector of pixel values **a**. Thus, we can consider that a point at position **a** in the *N*-dimensional space of pixel values is equivalent to an aberration state where any coefficient *α<sup>l</sup>* is replaced by *α<sup>l</sup>* + 2*πq*, where *q* is an arbitrary integer. This reveals a repetitive structure in each coordinate of the *N*-dimensional space that results in a lattice type symmetry. Hence, in this space, there is an infinite number of points that represent a given aberration state and these points are arranged in a transformed integer lattice Z*<sup>N</sup>* that is scaled by a factor of 2*π* and offset by the pixel value *α<sup>l</sup>* along each dimension. Furthermore, the lattice structure is based around a fundamental unit that is an *N*-dimensional cube of side length 2*π*; this fundamental unit is known as a Voronoi cell [10].

The matrix–vector operation of Equation (6) can now be interpreted as a rotation (as **W** is an orthogonal matrix) and scaling by 1/*N* of the vector **a** to give the Walsh coefficient vector **b**. The lattice symmetry is hence maintained in a rotated and scaled form when the state is described by **b**. When represented by the vector **a**, any Walsh function consists of equal magnitude amounts (+1 or −1) of each pixel value, so the vector must be directed along certain body diagonals of the cubic Voronoi cell. After transformation, these body diagonals lie along the axes of the vector space containing **b**. This lattice symmetry will be used for derivations later in this article.

## **5. Effects of Pixels and Modes on Signal Modulation**

Let us assume that each pixel of the AE has equal area (the pixels should have equal area if the amplitude profile at the pupil is uniform. For nonuniform illumination, the pixel area should be varied to provide the same total power in each pixel (e.g., the pixels could be large near the edge of the pupil for a Gaussian illumination profile.) No constraint is placed here on the position or shape of the pixels.), so that the integration used in Equation (1) can be replaced by a summation, assuming here that Φ(**r**) = 0:

$$I = \left| \frac{1}{N} \sum\_{l=1}^{N} \exp\left(i\alpha\_l\right) \right|^2 = \left| \frac{1}{N} \sum\_{l=1}^{N} \exp\left(i \sum\_{k=0}^{N-1} \beta\_k \mathcal{W}\_k^N[l] \right) \right|^2 \tag{7}$$

If the arbitrarily chosen *l*th pixel is varied and all other pixel values have the same value (here arbitrarily set to zero), then

$$I(\alpha\_l) = \left(1 - \frac{D}{2}\right) + \frac{D}{2}\cos\alpha\_l \tag{8}$$

where the modulation depth *<sup>D</sup>* <sup>=</sup> <sup>4</sup>(*<sup>N</sup>* <sup>−</sup> <sup>1</sup>)/*N*<sup>2</sup> . For all other Walsh functions other than *W<sup>N</sup>* 0 [*l*], the signal is also cyclic as a function of *β<sup>k</sup>* :

$$I(\beta\_k) = \cos^2 \beta\_k \tag{9}$$

where the modulation depth has the maximum possible value of 1 and the period in terms of *β<sup>k</sup>* is *π*. The effects of single pixel and modal variations are shown in Figure 3.

**Figure 3.** (**a**) Variation of signal for a 16-pixel system with a single pixel modulation showing low modulation depth and period 2*π*. (**b**) variation of a Walsh mode in the same system showing full modulation and period *π*.

If a combination of Walsh modes is present with small coefficients, we can use a Maclaurin expansion of the exponential in Equation (7) to give

$$I \approx 1 - \sum\_{j} \sum\_{k} \beta\_{j} \beta\_{k} \left(\frac{1}{N} \sum\_{l} \mathcal{W}\_{j}^{N}[l] \mathcal{W}\_{k}^{N}[l]\right) + \left[\sum\_{k} \beta\_{k} \left(\frac{1}{N} \sum\_{l} \mathcal{W}\_{k}^{N}[l]\right)\right]^{2} \tag{10}$$

The term in the final bracket <sup>1</sup> *<sup>N</sup>* <sup>∑</sup>*<sup>l</sup> <sup>W</sup><sup>N</sup> k* [*l*] is equal to zero except for when *k* = 0, in which case it has value 1. The orthogonality property of the Walsh functions means that <sup>1</sup> *<sup>N</sup>* <sup>∑</sup>*<sup>l</sup> <sup>W</sup><sup>N</sup> j* [*l*]*W<sup>N</sup> k* [*l*] in the second term is equivalent to the Kronecker delta function *δjk*. Hence, the signal is approximately

$$I \approx 1 - \sum\_{k=1}^{N-1} \beta\_k^{-2} \tag{11}$$

This is equivalent to the well-known approximation of the Strehl ratio as 1 − *φ* 2 *rms*, where *<sup>φ</sup>rms* is the root mean square value of the aberration, which in this case is equal to <sup>q</sup> ∑*<sup>k</sup> β<sup>k</sup>* 2 .

#### **6. Defining a Well-Corrected System**

If we define our system to be "well-corrected" when the root-mean-square (rms) phase error is below a chosen value, such that *φrms* ≤ *e*, then the system will be well-corrected when *I* ≥ 1 − *e* 2 . We can also express the second term in Equation (11) as a length *N* − 1 vector **b** 0 , which is equivalent to **b** with the piston coefficient removed, as

$$I \approx 1 - \left| \mathbf{b}' \right|^2 \tag{12}$$

Our condition for being well-corrected is hence equivalent to requiring that |**b** 0 | ≤ *e*. Interpreted geometrically, this means that any point within an (*N* − 1)-dimensional spherical volume of radius *e* centred on the point where *I* = 1 will be considered well-corrected.

In practice, the total aberration in a system will be the sum of the input aberration and that introduced by the AE, that is Φ(**r**) + Ψ(**r**). The values of **b** 0 discussed here consolidate these two sources of aberrations to represent the residual aberrations, such that we seek a perfect correction for which **b** 0 = **0**. We will also assume for this analysis that the input aberration consists entirely of modes that can be corrected by the AE.

## **7. Vector Space Representation of Well-Corrected States**

The signal variation as a function of pixel values was given in Equation (7) and can be expressed alternatively as

$$I = \left| \frac{1}{N} \sum\_{l=1}^{N} \exp\left(i\alpha\_{l}\right) \right|^{2} = \frac{1}{N^{2}} \left| 1 + \sum\_{l=2}^{N} \exp\left(i\alpha\_{l}'\right) \right|^{2} \tag{13}$$

where *α* 0 *<sup>l</sup>* = *α<sup>l</sup>* − *α*1. From this, we find the value *I* = 1 can only be obtained if *α* 0 *<sup>l</sup>* mod 2*π* = 0 for all *l*. We derive this result by considering the phasor sum of each of the terms in the final modulus expression: the maximum signal is only obtained when all of the exponential terms in the summation are real.

In a more general case where all pixels are offset by a mean pixel phase value *c* rather than the first pixel phase value, we could state that *I* = 1 only if each element of the vector **a** has a value *α<sup>l</sup>* = *c* + 2*nπ* where *n* is an arbitrary integer. We can also express the signal explicitly as *I*(**a**); this is a function of the vector **a**, which describes a point in an *N*-dimensional space. In this way, we can see that *I*(**a**) has maximum value 1 at the origin of this space when *c* is zero. Furthermore, we see that there is an infinite number of points in this space at which *I*(**a**) = 1. For example, on each of the axes, there are points where *I*(**a**) = 1 that are equally spaced at steps of 2*π*. Varying the value of *c* is equivalent to adding a constant phase to every pixel (or equivalent to adding the piston mode to the whole pupil) and thus has no effect on the signal. We deduce therefore that there are infinite lines of *I*(**a**) = 1 parallel to the vector (1, 1, . . . , 1) *T* . As the value of *c* has no effect on the signal, we can set this arbitrarily to zero without affecting further analysis. This is equivalent to removing the piston mode. It is also equivalent to taking the (*N* − 1) dimensional subspace including the origin in an orientation orthogonal to the direction (1, 1, . . . , 1) *T* . The position of the maxima in this slice would be equivalent to the positions of a scaled version of the integer lattice Z*N*, as explained in Section 3, projected along the direction (1, 1, . . . , 1) *T* . An illustration is provided in Figure 4.

**Figure 4.** Illustration of the lattice geometry for intensity variation with pixel phase value. As it is not possible to represent higher order systems in a three-dimensional rendering, the example shown is for a three-pixel system. While this system does not use Walsh modes, it shows the same phenomena of piston invariance and lattice-like behaviour. The axes represent each of the pixel phase values in radians. The same volume rendering is shown from two different angles. The visible contours are set at *I* = 0.01 (blue) and *I* = 0.8 (orange) to show the positions of the zeros and the maxima, respectively. The function *I* is invariant with the piston mode, hence the elongation of the contours along the direction (1, 1, 1). The lattice like structure of the function is apparent, in this case in the form of the hexagonal lattice. This shows that there are many different combinations of aberration mode coefficients that provide a similar well-corrected state. Analogous behaviour is found in higher dimensions for the Walsh-mode-based systems.

#### **8. Lattice Representation of States after Removal of Piston**

The removal of the piston mode is equivalent to the removal of the first row of the matrix **W** to create the reduced matrix **W**0 and removing the corresponding element of **b** to obtain a reduced vector **b** 0 so that

$$\mathbf{a} = \mathbf{W}^{\prime \, T} \mathbf{b}^{\prime} \tag{14}$$

As row–column products represent dot products between Walsh vectors, the following relationship is valid: **<sup>W</sup>**0**W**0*<sup>T</sup>* <sup>=</sup> *<sup>N</sup>***I***N*−1, so Equation (14) can be inverted as

$$\mathbf{b}' = \frac{1}{N} \mathbf{W}' \mathbf{a} \tag{15}$$

We can interpret Equation (15) as a transformation from an *N*-dimensional vector of pixel values **a** to an (*N* − 1)-dimensional vector of Walsh modal coefficients **b** 0 .

The matrix **W**0 is, however, a redundant representation, as the column space has dimension greater than its rank. This is rectified by removal of any one of the columns to create the matrix **W**00; we choose arbitrarily to remove the first column. In order to maintain the form of Equation (15), we remove the first element of the vector **a** to produce a new vector **a** 0 . From a practical perspective, this means that the pixel value *a*<sup>1</sup> is a dependent parameter determined by the other pixel values because of removal of the piston mode.

$$\mathbf{b}' = \frac{1}{N} \mathbf{W}'' \mathbf{a}'\tag{16}$$

The operation performed by matrix **W**00 is to map the vector **a** 0 to the corresponding vector **b** 0 . Similarly, the operation of **W**00 would be to transform (project) the positions of the maxima of *I*(**a**) = 1, which were located on lines passing through a scaled integer lattice Z*<sup>N</sup>* (as illustrated in Figure 4), to another lattice in the (*N* − 1)-dimensional space spanned by **b** 0 .

We can determine the properties of this new lattice by considering its Gram matrix, which is the matrix of the inner products between its lattice vectors [10]. The Gram matrix is hence given by

$$\mathbf{G} = \frac{1}{N} \mathbf{W}^{\prime\prime T} \mathbf{W}^{\prime\prime} = \frac{1}{N} \begin{pmatrix} N-1 & -1 & \cdots \\ -1 & N-1 & \cdots \\ \vdots & \vdots & \ddots \end{pmatrix} \tag{17}$$

where the factor of 1/*N* has been chosen so that the basis vectors are equivalent to Walsh functions with normalised vector magnitudes. **G** is equivalent to the Gram matrix of the so-called *A* ∗ *N*−1 lattice [10], which is an (*N* − 1)-dimensional analogue of the body centred cubic (BCC) lattice in three dimensions. It follows that the maxima in the (*N* − 1)-dimensional space spanned by **b** 0 must be located at the lattice points of a scaled *A* ∗ *N*−1 lattice.

Understanding the symmetries of this lattice thus provides an understanding of the symmetries of the function *I*(**b** 0 ). For example, the response of the signal around each lattice point should be identical. In other words, *I*(**b** <sup>0</sup> − **d***m*) = *I*(**b** 0 ) for all *m*, where **d***<sup>m</sup>* represents an arbitrary lattice point. As there is an infinite number of lattice points, there is an infinite combination of the (*N* − 1) Walsh coefficients that can provide the optimal correction. Furthermore, correction to a precision of *φrms* ≤ *e* can be achieved by finding a setting for the adaptive correction device that places **b** 0 within a sphere of radius *e* centred upon any of the lattice points.

#### **9. Fundamental Correction Space**

The lattice model allows us to define a fundamental correction space—that is, the range of **b** 0 we must search to find an optimal correction. This fundamental correction space is smaller than the correction space covered by all pixel values in the range 0 to 2*π* radians. The lattice symmetry of the function *I*(**b** 0 ) indicates that we need only search the Voronoi cell of the lattice in order to cover all possible states. Therefore, the search space is the Voronoi cell of the scaled *A* ∗ *N*−1 lattice, whose properties are known [10]. The position of the cell's vertices can be readily calculated. For example, the Voronoi cell of the *A* ∗ 3 (or BCC) lattice is a truncated octahedron; this would be the Voronoi cell for a *N* = 4 pixel system and is illustrated in Figure 5.

Using the symmetries of the Voronoi cell, further general properties of this fundamental correction space can be derived. Moving along any of the axes from a lattice point at which *I*(**b** 0 ) = 1, we encounter another lattice point at a distance *b* = *π* (noting that this corresponds to the variation of a single Walsh function, the pixel values of which will be ±*π* for this value of *b*; see Equation (9)). Therefore, the halfway point between lattice points along the axis is at a distance *b* = *π*/2. Hence, the distance between two faces of the Voronoi cell along such an axis is *π*. By looking solely along the axes, one might assume that the search space is an (*N* − 1)-dimensional cube of side length *π*, which would have a volume of *π N*−1 . However, the Voronoi cell's volume is given by

$$
\pi^{N-1}\sqrt{\det \mathbf{G}} = \frac{\pi^{N-1}}{\sqrt{N}} \tag{18}
$$

which is a factor 1/ √ *N* smaller than the encompassing hypercube [10]. Hence, for large numbers of pixels, the search space is considerably smaller than might be assumed if considering the pixel phases directly.

**Figure 5.** (**a**) Illustration of the lattice geometry for the fundamental correction space of a four-pixel system, which corresponds to three Walsh modes after neglecting the piston mode. The axes represent each of the Walsh coefficient values in radians. The same volume rendering is shown from two different angles. The visible contours are set at *I* = 0.01 (blue) and *I* = 0.8 (orange) to show the positions of the zeros and the maxima, respectively. The BCC lattice geometry is apparent. (**b**) The Voronoi cell for the *A* ∗ 3 (or BCC) lattice, a truncated octahedron, is shown within an encompassing cube of side length *π* radians.

#### **10. Implications of the Lattice Structure for Sensorless AO**

We have shown that searching the Voronoi cell of the *A* ∗ *N*−1 lattice is sufficient to find the optimal correction in the whole Walsh coefficient space of the adaptive element. The lattice structure also means that this same cell repeats over the whole space. Consequently, if the aberration in the system can be accurately represented by a finite number of Walsh modes, then the necessary search space is finite. This contrasts with an aberration represented by a finite number of continuous modes, such as Zernike polynomials, where the search space would have to be infinite in extent to cover all possible coefficient values.

In modal sensorless AO correction schemes, a sequence of predetermined bias aberrations for fixed set of correction modes is applied to the adaptive element and the corresponding signal values are recorded. From this set of measurements, the correction aberration is estimated using an appropriately chosen optimisation algorithm. When using continuous modes, such estimation can provide accurate correction for aberrations over a limited magnitude range but usually provides poor estimation outside this range. Using Walsh

modes, however, the finite search space within one Voronoi cell of the lattice structure for a fixed set of modes means that it is possible to design a correction scheme that is accurate across all possible aberrations within the Walsh mode set. For the continuous modes, the bias aberrations span a finite range, such as in the typical configuration for sensorless AO of having equal magnitude positive and negative biases for each mode. However, the same configuration for Walsh modes in effect spans an infinite range, as the bias positions are also repeated in the lattice structure across the whole coefficient space.

If a sequence of intensity measurements is taken with bias aberrations defined as each Walsh mode with an amplitude of *π*/2, the set of measurements is related to the Walsh transform of the pixel values (see Appendix B). The sum of these measurements is equal to the intensity after aberration correction. This has important implications for the normalisation of measurements for use in aberration estimation algorithms.

#### **11. Neural Network for Solution of the Inverse Problem**

We used the knowledge of the fundamental correction space to design estimators for a sensorless AO scheme. Specifically, the estimation process should solve the inverse problem to obtain aberration coefficients from the set of biased intensity measurements. We choose to demonstrate this with a NN estimator, whose design incorporates physical knowledge of the system. This method was chosen as it is more readily extendable to more advanced AO systems than conventional optimisation algorithms.

In this demonstration, we compare two similar NN-based methods for which the search space is defined differently. In the first case, it was assumed that each of the Walsh mode coefficients *a<sup>k</sup>* can take any value −*π*/2 < *a<sup>k</sup>* ≤ *π*/2. This was equivalent to taking any point in a (*N* − 1)-dimensional cube in coefficient space (we refer to this as the "hypercube cell"). In the second case, the coefficients were chosen so that they lie only within the Voronoi cell centred at the origin (we refer to this as the "primary Voronoi cell"). This primary Voronoi cell would be a sub-region of the hypercube used in the first case. Based upon the previous analysis, it was known that there would be a single point corresponding to optimum correction in the primary Voronoi cell but multiple such points in the hypercube cell.

Having multiple global optima in the search space can be detrimental when using neural networks to perform such an optimisation. This is because such ill-conditioned problems have no unique answer and thus prevent convergence of the network training. We employed a bespoke NN architecture that was developed to take advantage of the particular physical process used in the sensorless AO scheme. The overall process and the NN architecture are outlined here. More details about the NN can be found in Appendix D. In order to adequately sample the space, a biasing scheme was chosen that used 2*N* − 1 measurements. This corresponded to the application of positive and negative biases of magnitude *π*/3 for each of the (*N* − 1) Walsh modes, excluding piston; an additional nonbiased measurement was also included. For the *k*th mode, we denote the negatively bias measurement as *I* −1 *k* , the positively biased measurement as *I* +1 *k* and the unbiased measurement as *I*0.

The NN process was constructed as shown in Figure 6; a more detailed description of the architecture is given in Appendix D. The NN takes two separate sets of inputs, both of which rely on biased intensities (generated using Equation (1)): the first (*Input1*) directly uses these normalised intensity values, while the second (*Input2*) analytically processes the intensities based upon sinusoidal estimation to acquire a set of preliminary aberration coefficient estimates. The first input is passed into a convolutional neural network (CNN) followed by fully connected layers (FCL). It is then concatenated with the second input (*Input2*) and passed into fully connected layers to generate the outputs that correspond to the estimated Walsh coefficients. The rationale behind this dual input approach was that the learning task would be easier if based, in effect, on the differences between rough estimates and actual measurements rather than on the raw measurements themselves.

**Figure 6.** Outline of the NN architecture and preprocessing of data. CNN: convolutional neural network; FCL: fully connected layer; OL: output layer.

*Input1* was structured in the instance of *N* = 8, as a matrix in the following form

$$Input1 = \begin{pmatrix} I\_0 & I\_0 & I\_0 \\ I\_0 & I\_1^{+1} & I\_1^{-1} \\ I\_0 & I\_2^{+1} & I\_2^{-1} \\ I\_0 & I\_3^{+1} & I\_3^{-1} \\ I\_0 & I\_4^{+1} & I\_4^{-1} \\ I\_0 & I\_5^{+1} & I\_5^{-1} \\ I\_0 & I\_6^{+1} & I\_6^{-1} \\ I\_0 & I\_7^{+1} & I\_7^{-1} \end{pmatrix} \tag{19}$$

This structure was chosen so that the CNN block could interpret known correlations from biased measurement values. Indeed, the first CNN layer was structured so that it operated first on the 3-tuples of intensity values contained in each row of *Input1*, which were each expected to depend primarily on the corresponding coefficient *a<sup>k</sup>* . Further CNN layers sought to operate on correlations between the different coefficients.

The rough estimation used for *Input2* was based upon the knowledge that variations in a single Walsh mode coefficient led to a sinusoidal relationship with detected signal (see Appendix C). The value of each coefficient in *Input2* was set to the value that provided the maximum value of this sinusoidal variation. This estimation provides only a rough value for correction, as it treats each coefficient separately and hence does not deal with the coupling effect between combinations of modes.

Separate instances of this NN architecture were trained for the two scenarios. In the first case of the hypercube cell, the training and validation data were obtained by assigning a random value to each input coefficient in the range −*π*/2 < *a<sup>k</sup>* ≤ *π*/2. For the second case, the same combinations of coefficients were "wrapped" so that they lay only within the primary Voronoi cell (see Appendix D). In each case, the same calculated intensity measurements were used at the input. The difference between the two training processes lies in the coefficient labels that were used to calculate the loss function during network training—in the first case, the coefficient labels were defined throughout the hypercube cell; in the second case, the corresponding labels were wrapped into the primary Voronoi cell. Full details of the training procedure are provided in Appendix D.

Results are shown in Figure 7 for the case of *N* = 8, which corresponds to the correction of 7 Walsh modes. The loss function curves showed that only the scenario of confining the coefficient to within the primary Voronoi cell permitted the NN to converge. The mean squared phase error was reduced to 0.063 radians after correction, which corresponded to a Strehl ratio of approximately 0.6. In comparison, the loss for the hypercube case was significantly higher and the validation loss did not reduce while deviating from the training loss. This demonstrated the effectiveness of using prior knowledge about the lattice symmetries. It is expected that similar trends would be seen in scenarios with large pixel numbers, alongside more complicated NN architectures.

These results show clearly that the prior knowledge of the lattice geometry, and hence the need to confine the estimation to the primary Voronoi cell, is essential to effective training of the NN. When training using coefficients selected from the hypercube cell, the loss functions (which are related to wavefront estimation errors) do not converge sufficiently to provide good aberration correction. This is attributed to the existence of multiple optimal solutions within the hypercube cell that complicate the training process. However, this problem is avoided when using a training set where the labels were "wrapped" and the resulting labels were within the primary Voronoi cell, and thus only one optimal combination of Walsh mode coefficients is present.

**Figure 7.** (**a**) NN training and validation loss functions as mean square error (MSE) for the scenarios where the coefficient labels were defined throughout the hypercube cell (HC) or the primary Voronoi cell (VC) for *N* = 8. The insets on the right show a schematic representation in three dimensions the difference between the two types of cells. (**b**) An illustrative example of correction of an initial aberration consisting of 8 pixels, equivalently 7 polar Walsh modes. The residual error of the VC-based correction was far lower than that of the HC based correction. (**c**) Statistical summary of correction results from the NN validation set: Initial—distribution of initial input aberrations; Sinus—after estimation using sinusoidal model; HC—after correction using the hypercube cell method; VC—after correction with the Voronoi cell method. Error bars show the standard deviation of the distribution.

#### **12. Conclusions**

The insight provided by the mathematical link between Walsh mode AO and lattices informs the design of efficient sensorless AO schemes. This is relevant to the solution of the inverse problem of how to estimate the aberration coefficients from metric measurements with different applied bias aberrations. The new insights are particularly important when using NNs to solve this inverse problem, as otherwise we suffer from the challenges caused by having multiple solutions for a given set of metric measurements, which requires more complicated networks.

Although the Walsh modes are pixelated, when sufficient pixels are used, they can provide a suitably accurate representation of low order aberrations. While continuous modes, such as the Zernike polynomials, are commonly used, they are not guaranteed to provide

a good fit to high-order aberrations in the system, nor to the correction device, which may well be pixelated in nature. The results presented here have relevance not just for correction of low-order aberrations but also for high-order scattering compensation, where previously schemes have been based around control of Walsh modes (or similar) [15–17].

The analysis presented in this paper was based around a simple AO focussing system. However, as the overall lattice geometry arises from the nature of the aberration representation, and not the AO system or the optimisation metric, a similar repetitive lattice structure and primary Voronoi cell will hold for any other AO system using Walsh modes as the basis. These results should therefore have relevance to any application of AO using pixelated correction devices.

**Author Contributions:** Conceptualization: M.J.B.; methodology: M.J.B. and Q.H.; software: M.J.B., Q.H. and Y.X.; validation: Q.H., J.C., R.T. and Y.X.; investigation: M.J.B., Q.H. and Y.X.; writing original draft preparation: M.J.B. and Q.H.; writing—review and editing: R.T., J.C. and Y.X.; visualization: M.J.B. and Q.H.; supervision: M.J.B.; funding acquisition: M.J.B. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by the European Research Council, grant number 695140 (AdOMiS).

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Abbreviations**

The following abbreviations are used in this manuscript:


#### **Appendix A. Observations Based upon the Lattice Geometry**

Various connections can be made between the operation of the Walsh mode-based AO system and the lattice representation outlined above. The following points address certain relationships between variations of individual pixels and the Walsh functions.


These single pixel variations correspond to the kissing directions, which are the closest spacing between lattice points.


$$\frac{\left[\left(N-1\right)\mathfrak{J}^{2}+\left(N-1\right)^{2}\mathfrak{J}^{2}\right]}{N}=\left(N-1\right)\mathfrak{J}^{2}=\frac{N-1}{N^{2}}\mathfrak{\psi}^{2}\tag{A1}$$

Hence, the rms phase related to the peak-to-peak phase by *φrms* = √ *N*−1 *N ψ*.


$$I = \frac{\left(N - 2\right)^2}{N^2} \tag{A2}$$

which tends to 1 as *N* increases and, correspondingly, as the size of a single pixel decreases.

Hence, we can also use this lattice description to show that the signal varies only weakly in these kissing directions (corresponding to single pixels), whereas adjustment of Walsh modes provides more robust measurement. This confirms the results presented in Figure 3 using single pixel and modal variations. These properties are closely related to the extent of the Voronoi cell in these kissing directions. The Voronoi cell is in effect narrower in the kissing directions than in others. In general, larger signal modulations are obtained if one samples the function in the directions where the Voronoi cell has a larger extent.

## **Appendix B. Sensorless AO and the Walsh Transform**

Suppose we take a sequence of biased measurements, where we apply biases of each Walsh mode with a single positive amplitude of *π*/2 radians. This is equivalent to shifting the measurement point from the body centre of the (*N* − 1)-dimensional Voronoi cell to the centre of each (*N* −2)-dimensional facet centred along the positive axis of the corresponding Walsh mode. Due to the lattice structure, the facet in the positive direction is homologous to the facet in the negative direction. Therefore, by choosing this bias amplitude, we are simultaneously sampling both the positive and negative bias positions. When biasing the *k*th mode, we are increasing the (unknown) coefficient *β<sup>k</sup>* by *π*/2. The complex pixel value of that mode is therefore

$$\begin{split} \exp\left(i\left\{\beta\_{k} + \frac{\pi}{2}\right\} W\_{k}^{N}[l]\right) &= \exp\left(i\frac{\pi}{2} \mathcal{W}\_{k}^{N}[l]\right) \exp\left(i\beta\_{k} \mathcal{W}\_{k}^{N}[l]\right) \\ &= i \mathcal{W}\_{k}^{N}[l] \exp\left(i\beta\_{k} \mathcal{W}\_{k}^{N}[l]\right) \end{split} \tag{A3}$$

where we have exploited the fact that the Walsh function has values only of ±1. We can now see that the *k*th biased signal measurement is given from Equation (7) by

$$I\_k = \left| \frac{i}{N} \sum\_{l=1}^{N} \left\{ \exp \left( i \sum\_{k=0}^{N-1} \beta\_k \mathsf{W}\_k^N[l] \right) \mathsf{W}\_k^N[l] \right\} \right|^2 = \left| \frac{i}{N} \sum\_{l=1}^{N} \mathbbm{x}\_l \mathsf{W}\_k^N[l] \right|^2 \tag{A4}$$

where we have defined *x<sup>l</sup>* to represent the complex pixel values so that:

$$\chi\_l = \exp\left(i\sum\_{k=0}^{N-1} \beta\_k \mathcal{W}\_k^N[l]\right) \tag{A5}$$

The Walsh transform (WT) *X<sup>k</sup>* of a sequence *x<sup>l</sup>* of length N is defined conventionally as [8]:

$$X\_k = \frac{1}{N} \sum\_{l=0}^{N-1} x\_l \mathcal{W}\_k^N[l] \tag{A6}$$

We can therefore derive

$$I\_k = \left| X\_k \right|^2 \tag{A7}$$

In other words, we see that the set of biased intensity measurements is the modulus squared of the *k*th component of the WT of the complex pixel values *x<sup>l</sup>* . As the first Walsh function is piston (hence all pixels have value 1), it is related to the first element of the WT *X<sup>k</sup>* . As biasing with piston has no effect on the measurement, this is the same as the unbiased measurement in our system. Hence, the unbiased measurement (corresponding to the centre of the Voronoi cell) along with the *N* − 1 biased measurements correspond to a full set of intensity measurements. This set of intensity measurements is known as the spectral density of the WT of the set of complex pixel values. In practice, we cannot measure the complex values of *X<sup>k</sup>* directly, but we are able to quantify the Walsh spectral density through the intensity measurements with the applied bias modes.

We now derive a relationship using the fundamental properties of the WT. In [8], we find an equivalent of Parseval's theorem for WTs:

$$\sum\_{k=0}^{N-1} |X\_k|^2 = \frac{1}{N} \sum\_{l=1}^{N} |x\_l|^2 \tag{A8}$$

From the definition of *x<sup>l</sup>* , it is clear that its modulus is equal to one. Hence, we find that

$$\sum\_{k=0}^{N-1} |X\_k|^2 = 1 \tag{A9}$$

Equivalently, we find that the sum of the *N* − 1 biased measurements and the one unbiased measurement must add to one.

The importance of this result is that in a real experiment the measured signal is not actually normalised to one, but there is an unknown "brightness" that is a function of numerous experimental variables. This would be a multiplying factor in the expressions for *I*, which we have chosen to omit from the analysis for simplicity. We can use this result to obtain the "brightness" in a way that is independent of the input aberration, as it is simply the sum of the *N* − 1 biased measurements and the unbiased measurement.

#### **Appendix C. Estimation of Coefficients Using Simple Sinusoidal Model**

In this appendix, we present a method for rough initial estimation of individual Walsh mode coefficients. Let us redefine Equation (7) by including a factor *A* that is equivalent to an unknown maximum intensity

$$I = A \left| \frac{1}{N} \sum\_{l=1}^{N} \exp \left( i \sum\_{k=0}^{N-1} \beta\_k \mathcal{W}\_k^N[l] \right) \right|^2 \tag{A10}$$

The signal varies with a single modal coefficient *β<sup>j</sup>* as

$$I(\beta\_{\vec{\boldsymbol{\beta}}}) = A \left| \frac{1}{N} \sum\_{l=1}^{N} \exp\left( i \sum\_{k=0; k \neq j}^{N-1} \beta\_k \mathbf{W}\_k^N[l] \right) \exp\left( i \beta\_{\vec{\boldsymbol{\beta}}} \mathbf{W}\_{\vec{\boldsymbol{\beta}}}^N[l] \right) \right|^2$$

$$= A \left| \frac{1}{N} \sum\_{l=1}^{N} \mathbb{C}[l] \exp\left( i \beta\_{\vec{\boldsymbol{\beta}}} \mathbf{W}\_{\vec{\boldsymbol{\beta}}}^N[l] \right) \right|^2 \tag{A11}$$

where the effects of the modes contained within the first exponential term have been subsumed into the complex coefficients *C*[*l*].

We introduce a useful relationship, which takes advantage of the binary (±1) values of the Walsh functions [9]:

$$\exp\left(i\mathfrak{H}\_{\vec{\jmath}}\mathcal{W}\_{\vec{\jmath}}^{N}[l]\right) = \cos\left(\mathfrak{H}\_{\vec{\jmath}}\mathcal{W}\_{\vec{\jmath}}^{N}[l]\right) + i\sin\left(\mathfrak{H}\_{\vec{\jmath}}\mathcal{W}\_{\vec{\jmath}}^{N}[l]\right) = \cos\mathfrak{H}\_{\vec{\jmath}} + i\mathcal{W}\_{\vec{\jmath}}^{N}[l]\sin\mathfrak{H}\_{\vec{\jmath}} \tag{A12}$$

This leads to

$$I(\boldsymbol{\beta}\_{\hat{l}}) = A \left| \frac{1}{N} \sum\_{l=1}^{N} \mathbb{C}[l] \left\{ \cos \beta\_{\hat{l}} + i \mathcal{W}\_{l}^{N}[l] \sin \beta\_{\hat{l}} \right\} \right|^{2} \tag{A13}$$

The exact form of this solution will depend upon *C*[*l*] and hence on the other Walsh coefficients. However, the only terms in *β<sup>j</sup>* that can arise from the modulus square term are of the form of cos<sup>2</sup> *β<sup>j</sup>* , cos *β<sup>j</sup>* sin *β<sup>j</sup>* , or sin<sup>2</sup> *β<sup>j</sup>* , which can all be expressed as sinusoidal terms of argument 2*β<sup>j</sup>* . Hence, we deduce that *I βj* must be given by the form

$$I(\beta\_{\bar{l}}) = A \left[ p + q \cos \left( 2\beta\_{\bar{j}} + \zeta \right) \right] \tag{A14}$$

where *<sup>p</sup>*, *<sup>q</sup>* and *<sup>ζ</sup>* depend on all values *<sup>β</sup>k*6=*<sup>j</sup>* . Let us simplify Equation (A14) to the form *I θj* = *U* + *V* cos 2*θ<sup>j</sup>* , where we have defined 2*θ<sup>j</sup>* = 2*β<sup>j</sup>* + *ζ*.

Consider applying a bias *bW<sup>N</sup> j* so that the biased measurement *I*<sup>+</sup> = *U* +*V* cos 2*θ<sup>j</sup>* + 2*b* . Then, we apply the negative bias <sup>−</sup>*bW<sup>N</sup> j* so that the biased measurement *I*<sup>−</sup> = *U* + *V* cos 2*θ<sup>j</sup>* − 2*b* . We also use the unbiased measurement *I*<sup>0</sup> = *U* + *V* cos 2*θ<sup>j</sup>* . If we use a bias of *b* = *<sup>π</sup>* 3 , then we can obtain through elementary operations

$$\theta\_{\dot{j}} = \frac{1}{2} \tan^{-1} \left[ \sqrt{3} \frac{I\_+ - I\_-}{I\_+ + I\_- - 2I\_0} \right] = \beta\_{\dot{j}} + \frac{\zeta}{2} \tag{A15}$$

The value of *θ<sup>j</sup>* obtained through Equation (A15) was used as the initial rough estimate provided to the NN as *Input2*. The error between this estimate and the actual coefficient is given by *β<sup>j</sup>* − *θ<sup>j</sup>* = −*ζ*/2.

#### **Appendix D. Description of the Neural Network Training and Network Architecture**

The training data consisted of 2 <sup>20</sup> (about one million) simulated samples, and 2 7 (128) samples were used for validation to avoid overfitting. Both sets of data were generated for the case of *N* = 8, corresponding to the correction of seven Walsh modes (excluding piston). The total number of training datasets was chosen to provide a sufficient representation of the seven-dimensional search space.

For each training dataset, seven values were randomly generated corresponding to the coefficients of seven polar Walsh modes (*β<sup>k</sup>* ). The coefficients followed a uniform distribution over the range of −*π*/2 to *π*/2. These coefficients were used as labels during network training in the case of the hypercube cell.

The sum of the product between each coefficient with its corresponding unit Walsh mode formed the phase aberration Ψ(**r**) according to Equation (3). The phasor of the complex field was calculated by averaging the complex field over the pupil. By subtracting the angle of the mean phasor of Ψ(**r**) across the pupil (Equation (A16) ) and wrapping the phase back within the range of ±*π*, the aberration Ψ<sup>0</sup> (**r**) was calculated to be equivalent to Ψ(**r**) and within the primary Voronoi cell.

$$\Psi'(\mathbf{r}) = \arg\left(\exp\left\{i\left[\Psi(\mathbf{r}) - \arg\left(\frac{1}{\pi}\int \exp[i\Psi(\mathbf{r})] \, d\mathbf{r}\right)\right]\right\}\right) \tag{A16}$$

Using the orthogonality of Walsh modes, the new coefficients *β<sup>k</sup>* 0 of the equivalent aberration Ψ0 (**r**) (Equation (A17)) could be calculated. *β<sup>k</sup>* 0 were used as the labels when training the network with a confined searching space.

$$
\beta\_k{}^{\prime} = \frac{1}{\pi} \int \Psi^{\prime}(\mathbf{r}) \omega\_k(\mathbf{r}) d\mathbf{r} \tag{A17}
$$

For each set of data, two phase biases per Walsh mode were introduced. The bias amplitudes were chosen to be −*π*/3 and *π*/3. For each introduced bias phase, the intensity signal (*I*<sup>−</sup> and *I*<sup>+</sup> ) was computed using the Equation (1). The same equation was used to calculate the signal when no bias phase was introduced (*I*0). From previous discussions, the phase modulation of mode 1 (piston) would have no effect on the signal and thus excluded from the collection of signal readings. The total of 15 signal readings (14 biased and 1 unbiased readings) were used as the input (*Input1*) of the network.

In addition, from our previous discussion, the signal varied with mode coefficients in a period of *π*. A good approximation to the coefficients of each mode (*β<sup>k</sup>* 00) was obtained using Equation (A17). These approximations were also used as the separate input (*Input2*) to the two networks we trained. Figure A1 shows a few sets of *β<sup>k</sup>* , *β<sup>k</sup>* 0 and *β<sup>k</sup>* 00, which derived from the same initial aberration.

**Figure A1.** (**a**–**p**) 16 sets of randomly selected *β*, *β* 0 and *β* 00 derived from the same aberration, shown in blue, red and yellow plots, respectively. In some cases, *β* 00 (estimation using sinusoidal model) closely resembled *β* 0 (such as in case (**f**)) while in some other cases, *β* 00 could be different to *β* 0 (such as in (**e**) and (**n**)). There were also a small proportion of cases where *β* was identical to *β* 0 (such as (**n**)), which corresponded to the initial aberration being within the primary Voronoi cell.

The full diagram of the NN architecture is shown in Figure A2. The network was built using TensorFlow Keras. For the first CNN layer, the padding was chosen to be "valid" and strides equalled to (1,1). For each of the second to fourth CNN layers, the padding was chosen to be "same" and strides equalled to (1,1). Following each of the second to the fourth convolutional layers was a maxpooling layer with a pool size of 2 × 1. For all the layers (except the output layer), the nonlinear activation was chosen to be "tanh". The activation of the last output layer was linear. The initializer of all the kernels was glorot uniform. The loss function was mean squared error (MSE). The optimizer was Adam.

**Figure A2.** Full neural network architecture. The blue boxes were the two inputs to the network. The yellow boxes were the trainable kernels of the CNN with the corresponding dimensions as shown. The orange boxes were the internal layers of the CNN and dense layers in the later stages. The green box represents the output of the CNN.

## **References**

