training. Note, that *N* and *P* are set to 1 in the Algorithm 1 to simplify the notation.

**Algorithm 1:** Quasi-reinforcement learning algorithm for TANN

**Input:** Measurement model: <sup>C</sup>*<sup>n</sup>* <sup>→</sup> <sup>R</sup>*<sup>m</sup>* <sup>+</sup> , reward function *<sup>R</sup>* : <sup>C</sup>*<sup>n</sup>* <sup>×</sup> <sup>C</sup>*<sup>n</sup>* <sup>→</sup> [0, 1] **Output:**Trained target adaptive neural network *TANN:*C*mn*×*<sup>n</sup>* <sup>×</sup> <sup>C</sup>*<sup>n</sup>* <sup>→</sup> <sup>C</sup>*n*×*<sup>m</sup>*

	- a. Measure intensities square root b <sup>∈</sup> <sup>R</sup>*<sup>m</sup>* <sup>+</sup> of *z* by *M*.
	- b. Compute matrix W <sup>∈</sup> <sup>C</sup>*n*×*<sup>m</sup>* for *<sup>z</sup><sup>d</sup>* by *TANN* to define *NN*.
	- c. Compute recovered field <sup>e</sup>*<sup>z</sup>* <sup>∈</sup> <sup>C</sup>*<sup>n</sup>* from amplitudes *b* by *NN*.
	- d. Compute reward *<sup>r</sup>* <sup>=</sup> *<sup>R</sup>*(*z*, <sup>e</sup>*z*).
	- e. Update parameters of *TANN* to maximize *r*.
	- f. Perform a phase correction *z* = *z* · *e <sup>i</sup>*(arg(*z<sup>d</sup>* )−arg(e*z*)) .

#### **4. Simulations**

Simulations were performed for *N* = 1024, *P* = 256, *T* = 8 with a maximum of 5000 learning epochs. Signals *z* and targets *z<sup>d</sup>* were generated as complex vectors with uniformly distributed phases on [−*π*, *<sup>π</sup>*] and unit amplitudes. The initial values in <sup>U</sup> <sup>∈</sup> <sup>C</sup>*mn*×*<sup>n</sup>* were distributed by standard normal law. The step 6a of Algorithm 1 was performed by means of a mathematical model, instead of a direct usage of the experimental setup, which accelerated the learning process significantly. The mathematical model could be either a transmission matrix model *TM* or another neural network, which was referred to as NN-G in [22]. Computations were conducted on a computer using Windows 10 OS with GPU— NVIDIA GTX 1660 Ti, CPU—AMD Ryzen 5 3600 X 6-Core Processor and RAM—32 GB. To implement and train the TANN model, TensorFlow 2.5.0 library was used together with Python 3.7 language. TensorFlow encapsulates the interaction with GPU, thus we made no additional effort for parallelization. No multicore parallelization was required. Moreover, MATLAB graphical program was created to interact with the experimental setup. This program used one process for this goal.

As a particular example similar to the experiments reported below, we show in Figure 3a the evolution of the reward during training in the case of a 7-beam array with 70 measurements in the scattered pattern. The reward evolves quickly and continuously toward its maximum value in about 100 epochs. This means that the phasing quality reaches its maximum at any desired phase profile. The training required about 13 s. The phase correction process using this trained TANN shows (Figure 3b) that an average of only three iterations was enough to reach the 0.96 reward limit in a noiseless numerical study.

To obtain a full picture regarding the capabilities of TANN, several additional information slices are presented in Figure 4. It was numerically observed that in order to achieve a sufficiently high reward, say *r* > 0.96, there is a minimal required ratio *m*/*n* for different *n*. When the beam count varies from 4–20, the required *m*/*n* ratio increases from 4–12. Thus, it is important to show a minimal required ratio *m*/*n* for different *n* to achieve a sufficiently high reward. Different TANNs were trained for the various number of beams *n* ∈ {4, 6, 8, 10, 12, 14, 16, 20} and the different ratios between the number of measurements and the number of beams *m*/*n* ∈ {2, 4, 6, 8, 10, 12, 14, 16, 18, 20}. The maximal achievable reward was recorded and visualized as a heat map in Figure 4a, with the corresponding relative training time shown in Figure 4b. The maximal achievable reward is obtained by solving 1000 phase correction problems with different targets for each combination of *n* and

*m*/*n*, and computing 95% quantile of the rewards at the last correction. This statistic reveals the minimal reward, which was obtained during the solving of 95% of test problems.

**Figure 3.** (**a**) Reward evolution during TANN training for 7 beams and 70 detectors with 8 corrections steps *T*. (**b**) Average evolution of the quality factor Q according to the phase correction iterations of the trained system (100 random initial phase sets, 7 beams, 70 detectors). The red dotted line shows the 96% threshold. On average, only 3 steps of correction are required to reach the threshold phasing quality.

**Figure 4.** (**a**) Heat maps of the maximal achievable mean reward in grey scale and (**b**) its required relative training time. The red line in (**a**) approximates the separation line for which *r* = 0.96. The relative time on (**b**) is computed by dividing a learning time in seconds for each *n* and *m*/*n* by the minimal time to obtain GPU invariant information. The minimal time required by the GPU used in this paper was 13 s.

The red line in Figure 4a reveals the dependency between *n* and *m*/*n* to obtain *r* = 0.96 and is defined as *f*(*n*) = *<sup>n</sup>* <sup>2</sup> + 1. This gave us information about the minimal number of measurements needed to obtain *<sup>r</sup>* <sup>≥</sup> 0.96, which was *<sup>m</sup>* <sup>=</sup> *<sup>n</sup>* 2 <sup>2</sup> + *n*.

#### **5. Experiments**

We applied TANN associated with quasi-reinforcement learning to the phase-locking of a seven-amplifier laser system. As a conventional CBC configuration, the setup (Figure 5) comprised a master oscillator (MO/ CW semiconductor laser @1064nm) seeding seven parallel polarization maintaining (PM) fiber amplifiers. Their inputs were equipped with fiber-coupled LiNbO3 electro-optic phase modulators (EOM) and their outputs, once collimated by microlenses (µlens), formed a compact 1D array of laser beams (250 µm beam waist and 500 µm pitch) in a tiled-aperture arrangement (Figure 5). We used a master diode laser delivering 1064 nm radiation because most of the components used

to split and modulate the light feeding the amplifier array were already in our stock and designed to operate at this popular wavelength. The wavelength choice does not impact the working principle of the investigated technique. The master laser delivered about 80 mw of polarized light. Each individual output of the double-stage polarization maintaining the fiber amplifier array was limited to about 1 W of collimated polarized laser light by the available pump power. A beam splitter (BS) split the laser array output into a power fraction and a control fraction for the phase-locking loop. The adaptive phase correction loop contains a phase sensing module made of a ground glass diffuser [14,17] which achieved interferences between the individual beams on a 1D-photodetector array. Only sparse samples of the interference pattern were collected and served as a phase to intensity encoding. We used here only 70 intensity measurements from non-adjacent and periodically spaced pixels of the photodetector array. These data fed the digitizing and processing unit. It comprised the AD/DA converters, and the QRL-learned TANN that first computed the NN to be used in the loop. The TANN received the target phase chart, which could be changed on-demand, from a computer or any other external device. The processing unit delivered the phase corrections to apply to the seven electro-optic modulators. The far field of the BS main output was displayed on a camera with a positive lens for observation and performance analysis (not shown in Figure 5).

**Figure 5.** Left, setup of the 7-fiber amplifier array used in the reported experiments on-demand phase control using NN. The master oscillator was a semiconductor laser. EOM denotes LiNbO3 electro-optic modulator and the double-stage Ytterbium-doped fiber amplifiers were polarization maintained with 1W output power, µlens stands for microlens array, BS for beam splitter, D for diffuser. Right, photograph of the 1D-array output and of the phase analysis module.

The learning step of the TANN requires a large amount of training data. Because the experimental generation of suitable data requires a long period of time, we attained the training data by computation, using the measured transmission matrix (TM) of the scattering device that maps phase into intensity [14,24,25]. Based on the TM knowledge, we further generated a large number of training data for the TANN quasi-reinforcement learning. We set *T* = 8 as the number of correction loops in the QRL process. That number results from a previous numerical study and appears to offer a good trade-off between speed and accuracy. Optimization of the TANN parameters typically required a minimum of 100 Epochs of 256 couples of phase/intensity and 1024 target phase batches to reach a reward *R* of 99%. Figure 6 shows a typical evolution of the reward *R* versus the number of epochs during the TANN learning process with the data from the experimental TM.

**Figure 6.** Reward evolution during the TANN learning process for the 7-fiber laser array with 8 correction steps *T*.

Once TANN was trained, we used it to compute the NN embedded in the feedback loop for phase-locking the laser array. The NN quickly and efficiently locked the laser system to the in-phase state as shown in Figure 7, despite the standing phase fluctuations in the various amplifier arms. The laser exhibited the expected far field pattern (Figure 7a), very similar in shape and magnitude to the theoretical one for an in-phase beam array (Figure 7b).

**Figure 7.** (**a**) Experimental far field of the 7-fiber laser array locked in-phase. (**b**) Experimental and theoretical profiles of the phase-locked fiber laser array.

The NN phase correction process locked the laser system with a measured coherent combining efficiency of ~93%, derived from the signal of a photodiode located in the center of far field. This corresponds to less than λ/20 RMS residual deviation from a perfectly uniform discrete wavefront in the beam array.

A photodiode measured the on-axis peak intensity in the array far field. To quantify the phase-locking stability of the laser system, we recorded 10 million samples of its signal during 2.8 s, (Figure 8 in-phase locking case). The samples were further analyzed to plot their probability density for the OFF (open) and ON (closed) state, respectively. When the feedback loop was open, the signal probability density (black curve in Figure 8b) covered a medium and widely spread voltage range. On the contrary, when the servo is ON (red trace), the histogram shows a sharp peak at a higher voltage (0.93) which corresponds to the average combining efficiency, associated with a 1.2% standard deviation. This demonstrates that the NN-based phase control system offers an efficient and stable locking of the fiber laser array output. The power spectral density (PSD) related to the same photodiode signal is given in Figure 8c. It shows that the servo loop corrected the phase fluctuations of the combined beam array up to 1.5 kHz, while the servo loop operated at 11 kHz frequency, limited by the speed of the loop controller (Ni PXIe-1071). The analysis of numerous on/off servo transitions shows that the average number of phase corrections to reach an efficient phase-locking level is about 6, which is quite low although slightly larger than the number derived from noiseless numerical simulations.

**Figure 8.** (**a**) Normalized evolution of the combined beam power detected by a photodiode located on the far field center when the NN servo is OFF then ON, (**b**) Normalized histogram of the combined laser power evolution according to time when the servo is OFF (black) then ON (red). (**c**) Power spectral density of the 7-fiber laser array when the NN servo is OFF (black) and ON (red) and their moving average (green and blue traces, respectively).

When TANN computed the NN in the phase correction loop for setting a non-uniform phase map, the excellent operation of the system was preserved. Few examples of some specific phase charts, most of which can be easily recognized by the naked eye, are given in Figure 9. The desired phase map for the beam array can be any arbitrary phase state. It could be changed on-demand in real-time during the laser system operation. Figure 10a reports a sequence of repeated variation in the desired target. The vertical scale denotes the errors in the individual beams' phase with respect to their steady state values corresponding to the desired state. The parameter presents an intensity correlation between the scattered pattern at the time considered and the one at the end of each cycle. Periodically the demanded phase chart was changed, and there was a sudden drop of this parameter. Each time, the system quickly restored a value close to the maximum achievable. This means the system repeatedly achieved a fast and stable setting to the new requested phase relationships. Figure 10b presents the statistical data of experimental convergence to 1000 arbitrary target phase maps, on a very short time scale. This graph shows that, regardless of the target phases, the TANN phase control system set the fiber laser output of the desired phases within about six rounds of correction, i.e., here within 550 µs.

**Figure 9.** Examples of experimental far field patterns of phase-locked fiber laser output and their associated target phase sets.

**Figure 10.** Experimental sequence of periodic target phase changes showing the evolution of the speckle pattern intensity correlation. Vertical scale denotes errors in the individual beams' phase with respect to their steady state values corresponding to the desired state. (**a**) Red dotted lines mark the times of target phase changes, (**b**) same experimental sequence of data folded in a single cycle, highlighting the dynamics toward a steady state phase profile for 1000 abrupt phase changes. One iteration of the phase correction loop took 92 µs.

#### **6. Conclusions**

We have reported an improved version of a phase-locking technique for a laser beam array based on neural network and quasi reinforcement learning that offers a quick ondemand change of the transverse phase distribution in the array. The NN is included in a feedback loop and computes the phase correction from data measured in a scattered pattern of the output. Instead of learning the NN for a given target, as previously studied, the original idea presented here is in the learning of a preliminary network TANN that will compute the NN parameters suited to the desired phase map. The calculation by TANN is on an order of magnitude faster than the NN training duration. Thus, the NN quickly accommodates any change of the desired phase set, so that the new architecture forms an actual adaptive phase-locking system. We first analyzed the proposed approach by simulation of an array of 2 to 20 beams. The training time of TANN was short, requiring approximately 5 min for 20 beams. The phasing accuracy was high with the NN computed by TANN, and the dynamics for phase-locking were fast, needing only a few (three iterations on average for a seven-beam array) phase error correction steps, regardless of the target phase set. The impact on the performances concerning sparsity in the sampling of the scattered pattern which was employed in the phase-sensing module was analyzed. A rule of thumb was derived for the lowest number of measurements in order to obtain a

sufficiently high phasing accuracy. The technique can be applied to any form of geometry of the near field array including 1D, 2D, triangular or square lattices, rings, etc.

In the second step, we implemented the technique on a 7-channel fiber laser delivering multi-watt linearly polarized laser radiation at 1064 nm in a 1D-beam array. This experiment, with double-stage fiber amplifiers, demonstrated the efficiency of the quasireinforcement learning approach to set and lock the array output on a requested target phase set. This represents, to the best of our knowledge, the first time that a real laser beam array, with many independent and long amplifying arms, was phase-locked using an NN approach. The phase-lock loop featured a phasing accuracy close to λ/20 RMS and a measured bandwidth above 1 kHz. We presented the adaptive behavior of the system with respect to the target choice and analyzed its dynamics. The time response to a new request was measured at approximately 550 µs, in the non-optimized configuration. It is sufficiently fast, for example, to compensate for first order perturbations of the atmosphere in cases where the device would be connected to an appropriate sensor.

**Author Contributions:** Formal analysis: M.S., G.M. and P.A.; Investigation: M.S.; Project administration: V.K.; Software: G.M.; Supervision: P.A. and A.D.-B.; Validation: A.B. (Alexandre Boju).; Visualization: A.B. (Alexandre Boju) and A.B. (Alain Barthelemy).; Writing—original draft: A.B. (Alain Barthelemy) and V.K.; Writing—review and editing.: A.D.-B. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by the Agence Nationale de la Recherche (ANR-10-LABX-0074-01) and CILAS Company (Ariane Group) under grant n ◦2016/0425.

**Data Availability Statement:** Not applicable.

**Conflicts of Interest:** The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, nor in the decision to publish the results.

#### **References**

