**1. Introduction**

Coherent beam combining (CBC) of multiple emitters represents a key versatile technique in providing high average power or high-energy short pulses while maintaining beam quality [1]. The CBC architectures are designed to handle the laser power distributed over a set of amplification channels arranged in parallel. Due to thermal effects and mechanical instabilities, each channel phase of the piston type must be adjusted over time to maintain the combining efficiency and wavefront quality of the combined beam. There are two methods of performing the combining step, such as the tiled-aperture and filledaperture techniques. In the first configuration, the amplified beams are placed side by side to form a kind of large synthetic pupil and are then coherently overlapped in the far field. In the second configuration, they are superimposed by splitters or by a diffractive optical element (DOE) in the near field to obtain a single high-power beam. The tiled-aperture arrangement offers the opportunity to dynamically shape the synthetic wavefront by tuning the piston phase of each element of the array to a desired value. This dynamic shaping could be useful particularly for compensation of phase aberration due to atmospheric perturbations in the context of directed energy production [2,3]. CBC was also recently investigated to shape the far field pattern of a high-power beam array. In particular, T. Hou et al. numerically validated the generation of orbital angular momentum (OAM) laser beams in a tiled-aperture architecture [4]. In 2021, M. Veinhard et al. demonstrated OAM beam shaping by tailoring the phase of 61 beams in the femtosecond regime [5]. These specific modes, which preserve their ring intensity profile during propagation, are of interest in many areas such as particle manipulation and free-space propagation. Moreover, real-time control of intensity shape at focus by CBC at a high-power level can optimize the performance of material processing.

**Citation:** Shpakovych, M.; Maulion, G.; Boju, A.; Armand, P.; Barthélémy, A.; Desfarges-Berthelemot, A.; Kermene, V. On-Demand Phase Control of a 7-Fiber Amplifiers Array with Neural Network and Quasi-Reinforcement Learning. *Photonics* **2022**, *9*, 243. https://doi.org/10.3390/ photonics9040243

Received: 14 March 2022 Accepted: 1 April 2022 Published: 6 April 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

An active coherent combining device with fiber amplifiers is based on a master oscillator power amplifier (MOPA) configuration with multiple parallel fiber amplifiers that undergo internal and environmental perturbations. The phase fluctuation compensation at the output of the fiber array is realized by electro-optic modulators which command comes from direct measurements of the current output phase state [6,7], or by correcting the phase in an iterative way to optimize a given parameter [8–11]. In the latter case, the loop performing the phase correction includes an optimization algorithm such as the popular stochastic parallel gradient-descent (SPGD) method or the alternating projection (AP) method [12–14]. With the SPGD method, as the beams count increases, the correction bandwidth drops significantly. The AP method, on the contrary, is well suited to the phase-lock of a wide beam array at the expense of a large number of detectors. A third method, based on neural network and deep learning, was recently investigated.

Among the many applications of neural networks (NN) in optics, few of them recently published dealt with CBC [15–22]. In most cases, the papers reported numerical studies. Some contributions investigated NN for direct, one-step, phase recovery of the beam array from scattered patterns through a diffuser [17] or through a diffractive optical element DOE [20]. In the latter case, an NN with only two layers provided accurate phase recovery but in a limited phase error range. Despite it being trained in a limited range, once applied in a feedback system for phase correction, the technique was able to compensate for a full range [−π, π] of random initial phase errors and to reach phase-locking. It required approximately 40 iterations on average to lock a 9 × 9 array, which was demonstrated to be ten times faster than SPGD optimization. A reinforcement learning method was also considered as a second option for beam combining with NN [15,19,21]. In a first experiment with a two-fiber interferometer [15], the authors demonstrated the technique could be as efficient as a standard PID (proportional integrator differentiator) controller or as SPGD. Previous simulations on deep reinforcement learning with a deep deterministic policy gradient have used the far field pattern as input to the NN. Locking of the phase was shown to require 6 to 12 iterations for a 7-beam array [19]. However, they raised issues regarding scalability for large arrays in particular due to the dimensionality of the training data set, a loss in accuracy and the duration of the training. The approach offers the additional capability of tailoring the array far field, such as, for the generation of orbital angular momentum beams (OAM) [18]. In a recent publication [22], we proposed a third option, called quasi-reinforcement learning (QRL). Training of the NN for phase-locking was carried out specifically for operation in a loop with a given number of iterations. Simulations and a proof of principle experiment demonstrated efficient and fast (six iterations) phase-locking of a 100-beam array.

In this paper, we report first a new version of this machine-learning scheme that provides access to instantaneous tailoring and locking of a tiled beam array on any phase map. Then, we present experiments of its implementation in a seven-fiber laser array. It is the first time, to the best of our best knowledge, that an actual array of fiber amplifiers has been successfully phase-locked and controlled by an NN.

In the following paragraph, we first briefly remind the reader of the principle of the approach, as detailed in [22]. Then, we describe the improved version of the NN implemented in the QRL process, which allows real-time adaptive changes of the desired phase map for the laser beam array. Finally, we present an experimental phase control from the QRL approach in the dynamic environment of a fiber laser array. This shows that the iterative phase-locking process converges to any static or dynamic desired phase relationship with a correction loop bandwidth over 1 kHz.

#### **2. Neural Network in a Phase Reduction Loop with Quasi-Reinforcement Learning**

The system we have previously proposed to control the phase of a laser beam array [22] (laser fields of complex amplitude *z* with unknown phases) is described on Figure 1. It is composed of (i) a diffuser for mapping individual phases into intensity through scattering, (ii) a photo-detector array, which converts optical intensity into voltage, (iii) an NN, which

processes the electrical signal and provides correction commands to an array of phase modulators. The NN serves to perform the inverse of the transformation achieved by the diffuser. From sparse samples (measurements *b* 2 ) of the scattered intensity pattern, it predicts a value <sup>e</sup>*<sup>z</sup>* of the individual laser fields in the array. Knowledge of the presumed phase set arg(e*z*) and of the desired phase set arg(*zd*) then permits computation of the correction <sup>=</sup> arg(*zd*) <sup>−</sup> arg(e*z*), which serves as a command to the phase modulators. The high performance of the scheme, as demonstrated numerically and in a proof of principle experiment, relies on its specific QRL training. It consists in an optimization of the NN parameters, considering the looped operation of the system for a fixed given number of iterations *T*. For each round in the loop, an optimization is achieved in order to obtain the highest reward, i.e., the lowest difference between the phases after correction and the desired phases. QRL also bears a role in the learning of a recurrent neural network, although with some peculiarities. First experiments [22] showed that, unlike NN learned for direct (one-step) phase retrieval [18,20], the NN, specifically trained for phase correction in an error reduction loop, remains efficient and accurate for an array with a large number of beams (100), and for correction of phase angle on the full circle [−*π,+π*]. To preserve accuracy, the total number of iterations in the loop during training must be empirically determined, as it evolves slightly owing to the array size and to the number of intensity samples in the diffraction pattern. Most of the time it was close to *T* = 6. Once in operation, the trained NN adjusts the initial distorted phase front onto the desired one after a number of corrections less than, or equal to maximum of six.

**Figure 1.** Principle of the system for phase-locking a coherent beam array with a neural network. In a preliminary step, quasi-reinforcement learning (QRL) trains the NN specifically for working in a feedback loop and for setting the array output to a given target phase chart. BS denotes beam splitter.

#### **3. Target Adaptive NN with QRL Process**

With the previous NN version [22], the laser beam array could be locked onto the inphase state or any other arbitrary target phase set. However, the NN must be trained with the desired target phase set which makes a fast change of target unlikely due to the duration of the training. This explains the reason behind our proposal of implementing a target adaptive neural network (TANN) in the QRL scheme to circumvent this drawback. With this new version, the target phase set can be changed on-demand during laser system operation.

The idea is to build the network TANN that will compute the set of parameters of the NN for use in the phase-lock loop. TANN takes the vector of target phases as an input and returns the weights of the NN. Each time one modifies the desired phase profile, the NN parameters are computed again. The calculation is extremely fast (matrix vector product) and thus offers almost real-time adaptive wavefront shaping. The new adaptive phase-locking and phase-profiling system can be schematically described as shown in Figure 2.

**Figure 2.** Feedback loop with target adaptive neural network TANN that computes the weights of the NN embedded in the loop, at each change of the target.

TANN takes as an input, a vector *<sup>z</sup><sup>d</sup>* <sup>∈</sup> <sup>C</sup>*<sup>n</sup>* of laser fields with target phases and returns the set of parameters that is used to define the correction model for the given target. We recall that in [22] we used NN(*b*) = W2(W1*b* + *β*1) + *β*<sup>2</sup> as a correction model fed by the square root of the measurements *b* 2 , where the set of parameters were <sup>W</sup><sup>1</sup> <sup>∈</sup> <sup>R</sup>4*n*×*m*, <sup>W</sup><sup>2</sup> <sup>∈</sup> <sup>R</sup>2*n*×4*<sup>n</sup>* , *<sup>β</sup>*<sup>1</sup> <sup>∈</sup> <sup>R</sup>4*<sup>n</sup>* , *<sup>β</sup>*<sup>2</sup> <sup>∈</sup> <sup>R</sup>2*<sup>n</sup>* for *n* beams and *m* > *n* measurements. In this context, TANN should return a real vector of dimension 4*nm* + 8*n* <sup>2</sup> + 6*n*, which is then split into several parts to define W1,2, *β*1,2.

This means that TANN itself has a minimum of *O n* 3 parameters to train. This fact requires a reduction in the number of parameters in a correction NN as much as possible. Note, that the NN in [22] is a simple affine transform Wb + *β*, where W = W2W<sup>1</sup> and *β* = W2*β*<sup>1</sup> + *β*2. This smaller form decreases the number of parameters in the NN model to 2*nm* + 2*n*. It was also observed empirically that bias *β* did not have a great impact on the NN's correction capability. Let us consider a new correction model of the form NN(*b*) <sup>=</sup> *Wb*. However, instead of using the real matrix <sup>W</sup> <sup>∈</sup> <sup>R</sup>2*n*×*m*, which computes real and imaginary parts separately, we change it to a fully complex form <sup>W</sup> <sup>∈</sup> <sup>C</sup>*n*×*m*. The reason behind why this smaller model was not used in [22], but had similar numerical properties, was that it required more time to train the parameters, which represents an important factor when working with 100 beams. The architecture of TANN is a simple linear map (*U*) from the vector of desired laser fields set *<sup>z</sup><sup>d</sup>* <sup>∈</sup> <sup>C</sup>*<sup>n</sup>* to the vector of NN parameters, the output of which is reshaped into a matrix *TANN*(*zd*) = Reshape(*Uzd*), where Reshape : <sup>C</sup>*mn* <sup>→</sup> <sup>C</sup>*n*×*<sup>m</sup>* and trainable parameters <sup>U</sup> <sup>∈</sup> <sup>C</sup>*nm*×*<sup>n</sup>* . The learning process is similar to [22] and is presented in Algorithm 1, where the reward function is a resemblance parameter between the actual array phase *arg*(*z*) and the computed recovered array phase *arg*(e*z*). It is defined as:

$$\mathcal{R}(z,\hat{z}) = \frac{|\langle z,\hat{z}\rangle|^2}{\left\langle |z|, |\hat{z}|^2 \right\rangle} \tag{1}$$

In which the maximum equals 1, if and only if *arg*(*z*) <sup>=</sup> *arg*(e*z*) reaches up to a constant. In the framework of laser phase-locking, *<sup>R</sup>*(*z*, <sup>e</sup>*z*) is equal to the phasing quality *<sup>Q</sup>*, also called combining efficiency, which measures how close the controlled array wavefront is to uniformity. It is usually assumed that in practice an RMS deviation of *λ*/30 is a very good value, which corresponds to *Q* = 0.96 [23]. Therefore, this value fixes the minimum reward to reach during the training of the TANN.

As with the same concept seen in [22], the NN, which now depends on the target, computes a correction as a complex vector instead of a vector of phases. To accelerate the learning, we use a batch of targets *<sup>z</sup><sup>d</sup>* <sup>∈</sup> <sup>C</sup>*N*×*<sup>n</sup>* and signals *<sup>z</sup>* <sup>∈</sup> <sup>C</sup>*N*×*P*×*<sup>n</sup>* , where *<sup>N</sup>* and *<sup>P</sup>* denote positive natural numbers. The batch of the form *<sup>z</sup>* <sup>∈</sup> <sup>C</sup>*N*×*P*×*<sup>n</sup>* means that we generate *P* initial signals to correct for each of the *N* targets during
