Linear Optics Calibration in a Storage Ring Based on Machine Learning

Li, Ruichun; Jiang, Bocheng; Zhang, Qinglei; Zhao, Zhentang; Li, Changliang; Wang, Kun

doi:10.3390/app13148034

Open AccessArticle

Linear Optics Calibration in a Storage Ring Based on Machine Learning

by

Ruichun Li

^1,2,3

,

Bocheng Jiang

^4,*

,

Qinglei Zhang

^1,5,*

,

Zhentang Zhao

^1,2,3,5,

Changliang Li

^1,5 and

Kun Wang

^1,5

¹

Shanghai Synchrotron Radiation Facility, Shanghai Advanced Research Institute, Chinese Academy of Sciences, Shanghai 201210, China

²

Shanghai Institute of Applied Physics, Chinese Academy of Sciences, Shanghai 201800, China

³

School of Physical Science and Technology, ShanghaiTech University, Shanghai 201210, China

⁴

Laboratory for Ultrafast Transient Facility, Chongqing University, Chongqing 401331, China

⁵

University of Chinese Academy of Sciences, Beijing 100049, China

^*

Authors to whom correspondence should be addressed.

Appl. Sci. 2023, 13(14), 8034; https://doi.org/10.3390/app13148034

Submission received: 15 June 2023 / Revised: 3 July 2023 / Accepted: 5 July 2023 / Published: 10 July 2023

(This article belongs to the Topic Advances in AI-Empowered Beamline Automation and Data Science in Advanced Photon Sources)

Download

Browse Figures

Versions Notes

Abstract

:

Inevitably, various errors occur in an actual storage ring, such as magnetic field errors, magnet misalignments, and ground settlement deformation, which cause closed orbit distortion and tuning shift. Therefore, linear optics calibration is an essential procedure for storage rings. In this paper, we introduce a new method using machine learning to calibrate linear optics. This method is different from the traditional linear optics from closed orbit (LOCO) method, which is based on singular value decomposition (SVD). The machine learning model does not need to be computed by SVD. Our study shows that the machine-learning-based method can significantly reduce the difference between the model response matrix and the measurement response matrix by adjusting the strength of the quadrupoles.

Keywords:

storage ring; linear optics calibration; machine learning; convolutional neural network

1. Introduction

The synchrotron light source is one of the most powerful tools in modern science and technology, providing highly brilliant and highly stable synchrotron radiation to meet the advanced experimental conditions in frontier research [1,2]. For 3rd- or 4th-generation synchrotron light sources to maintain good performance, it is important to maintain a highly stable electron beam, which involves orbit stability, current stability, source size stability, control of the collective effects, and energy stabilization [3,4,5,6,7]. This paper discusses linear optics calibration, which is an essential aspect of stable beams.

During the design stage of a storage ring, considerable efforts were made to optimize the storage ring parameters such as the tune, lifetime, natural beam emittances, and dynamic apertures (DA). In an ideal state, the storage ring parameters are at the desired values and the beam moves in a closed orbit in the storage ring. However, the actual situation often differs from the theoretical model. Not only do magnetic field errors and magnet misalignments cause the closed orbit distortion, but thermal expansion and contraction, ground settlement deformation, and other factors can cause various displacements and mechanical vibrations of the storage ring magnets, leading to closed orbit distortion and tune shift [8]. These unwanted errors can lead to an increase in beam emittance and a decrease in dynamic and momentum apertures. In addition, they affect experimental results and quality in the beamline. Therefore, it is important to apply an effective method for measuring the beam optics parameters, complete the linear optics calibration, and, finally, ensure successful storage ring operation [9].

The Shanghai Synchrotron Radiation Facility (SSRF), located in Shanghai, China, is based on a 3.5 GeV storage ring with an average beam current of more than 200 mA [10]. The initial SSRF storage ring consisted of 20 standard double bend achromatic (DBA) cells with four super-periods, but its lattice was upgraded with two DBA cells with high magnetic field super-bends [11]. Sixty of these sextupoles generate skew quadrupole fields via auxiliary coils. Moreover, a total of 138 beam position monitors (BPM) and 80 correctors are distributed throughout the storage ring. In total, there are 206 quadrupoles, 140 sextupoles, 60 skew quadrupoles, and 48 bending magnets in the SSRF storage ring. As seen above, having so many magnets in the storage ring is a major challenge for SSRF commissioning.

The linear optics from closed orbit (LOCO) algorithm is an effective tool for calibrating linear optics [12]. It is calculated on the basis of the orbit response matrix. The orbit response matrix represents the change in orbit at the BPM with the strength of the correctors. As long as the phase difference between the correctors and BPM is suitable, the closed orbit response matrix can reflect the linear optics information of the storage ring [13]. In this way, by adjusting the model lattice parameters and fitting the model response matrix to the measured matrix, we can determine the actual machine errors. The strength of quadrupoles can be accurately found to restore the actual machine to the designed state.

Currently, machine learning is increasingly being used at accelerators worldwide [14,15]. For example, machine-learning-based methods are used to optimize nonlinear storage ring problems and to find the multiobjective optimized Pareto front [16,17]. A study at Lawrence Berkeley National Lab (LBNL) has shown that machine learning algorithms can be used instead of traditional feedforward methods to counteract the insertion device gap or phase motion-induced perturbations on the advanced light source (ALS) light source electron beam [6]. Furthermore, a new machine learning application is innovative polynomial neural networks for fitting beam dynamics parameters in accelerators [18]. In addition, researchers from the Institute of High Energy Physics have proposed a new dynamical aperture (DA) prediction method based on machine learning, which can reduce the computational cost of DA tracking by approximately an order of magnitude while maintaining sufficiently high evaluation accuracy [19]. Y. Lu et al. proposed using ML-MOGA as a full-fledged replacement for the traditional Tr-MOGA in optimizing 4GSR lattices [17]. Their research opened up new possibilities for considerable speedups of traditionally lengthy computational processes.

In this paper, we explore applying machine learning to linear optics calibration. Such linear optics calibration based on machine learning is a new method. In practice, the LOCO method minimizes the difference between the orbit response matrix of the model and the measured matrix through SVD calculation and multiple iterations. The machine learning model can predict the quadrupole errors directly from the response matrix. Our method provides a new option for linear optics calibration of a storage ring. This paper is organized into the following sections. Section 2 explains the theory of linear optics calibration and describes machine learning theory, including the experimental method, and construction of convolutional neural networks (CNN). Section 3 documents the experimental research on machine learning, including the experimental results and analysis. Finally, Section 4 provides a brief summary of this paper.

2. Materials and Methods

2.1. Theory of Linear Optics Calibration

2.1.1. Classical Theory

If the linear optics of a storage ring are known, its orbit response matrix can be easily obtained for either a model or a real machine. LOCO is an inverse procedure. The linear optics in the storage ring are calibrated by the orbit response matrix. The beta functions, magnetic gradients, BPM gains and coupling, and corrector gains and coupling of the real storage ring are accurately reproduced by minimizing the difference between the model and the measured response matrix. In this way, calibration errors are found, and optical functions are corrected. Safranek first proposed the LOCO method to correct the optical parameters of the NSLS VUV ring and ALS storage ring in the 1990s [20]. Since then, LOCO has been used at many accelerators worldwide to control accelerator optics during machine development and to maintain optimal conditions for standard machine operation. In 2002, James Safranek and Xiaobiao Huang ported LOCO to MATLAB with a graphical user interface to facilitate data analysis [21]. In recent years, LOCO has become a powerful tool for optical corrections of storage ring optics in modern light sources around the world [22].

In SSRF, MATLAB-based LOCO is also used for optical corrections [23]. There are 138 BPM in the storage ring of the SSRF, and each BPM measures one track position. Therefore, we can define a vector

Δ u

to represent the closed orbit distortion with the unit of meter and the dimension of 138,

Δ u = {(u_{1}, u_{2}, \dots, u_{m}, \dots, u_{138})}^{T}

where

u

represents the closed orbit distortion in the horizontal or vertical direction at the BPM. Then, we define a vector

Θ

to represent the strength change in the correctors with the unit of radian and the dimension 80,

Θ = {(θ_{1}, θ_{2}, \dots, θ_{n}, \dots, θ_{80})}^{T}

, where

θ

represents the strength change in the correctors in the horizontal or vertical direction. The relationship between closed orbit distortion and the strength change in the correctors is given by

Δ u = R Θ

, where

R

is an orbit response matrix with 138 rows and 80 columns. In practice, the orbit response matrix is often obtained by experiment rather than calculated from the formula.

The difference between the model and the measured response matrix is given by

Δ R = R_{m o d e l} - R_{m e a s}

, where

R_{m o d e l}

is the model matrix and

R_{m e a s}

is the measurement matrix.

Δ R

is sensitive to BPM gains and their coupling in the x and y plane, corrector gains and coupling, bending magnet rotation, quadrupole strength and rotation, sextupole transverse offset, and so on. LOCO is one such tool, based on orbit response matrix analysis, which uses the sensitivity of the orbit response matrix to these magnet errors to adjust linear optics. Its goal is to minimize

Δ R

by changing the above sensitive machine parameters

Δ W

. The linear relationship between

Δ R

and

Δ W

is given by

Δ R = - J Δ W

where

J

(11,040 × N) is called the Jacobian matrix [20]. In practice,

Δ R

also needs to be divided by the measured rms noise levels for the BPM. Through singular value decomposition and multiple iterations, the fitting parameter

Δ W

that can minimize the difference between the orbit response matrix of the model and the measured matrix is obtained. Finally, the calibration model of the machine is obtained, and the linear optics are corrected.

The theoretical part is described above, but there are some differences in practice at SSRF. As mentioned in the introduction, there are many magnets in the SSRF storage ring, which causes many fitting parameters. When there are many parameters, there are often difficulties in the fitting process, such as low accuracy and slow convergence. Therefore, in the actual LOCO operation of SSRF, we rarely add sextupoles to the fitting parameters but only consider the influence of quadrupoles. There are 206 quadrupoles in SSRF, and they are classified into 45 families. In the operating process, fitting by family is often used in the first step, and fitting by magnet is used when necessary.

2.1.2. Machine Learning Theory

A new idea for linear optics calibration is the application of machine learning. In recent years, with the mature development of machine learning and neural networks, the data-driven method has been applied in many different fields [24]. It does not require any assumptions or prior knowledge. Through the neural network model, the method provides a completely data-based information mining tool. Through the machine-learning-based method, the machine parameters can be predicted according to the measured orbit response matrix.

Machine learning uses neural networks to fit data made up of many neurons that can solve complex problems. These neurons are the most basic components of neural networks, proposed by McCulloch and Pitts in 1943 [25]. Neurons can take input from other neurons or external data and then calculate an output. The output data are obtained by weighting the input data, adding the bias, and finally calculating the activation function. Figure 1 shows a schematic illustration of a neuron.

For dense neural networks, each layer is composed of several neurons or a single neuron, and all neurons in the previous layer connect to those in the next layer. Dense neural networks cannot handle multidimensional data well, but CNN overcomes this shortcoming well. The greatest problem with using dense neural networks to process multidimensional data is that there are too many parameters in the dense layers. In addition to slowing down the computational speed, increasing the number of parameters can easily lead to overfitting. A reasonable neural network structure is needed to effectively reduce the number of parameters in neural networks, and CNN can better achieve this goal. This is because CNN has sparse connection and weight-sharing characteristics [26].

The first CNN was the time delay neural network (TDNN) proposed by Alexander Waibel et al. in 1987 [27]. TDNN was applied to the speech recognition problem. In 1989, Yann LeCun constructed a CNN for computer vision problems that was successfully applied to the recognition of handwritten zip code digits provided by the U.S. Postal Service. He first used the word ‘convolution’ in his discussion of his network structure [28]. After the 21st century, with the introduction of deep learning theory and the improvement in computing equipment, CNNs have been rapidly developed and applied to computer vision, natural language processing, and other fields [29,30,31].

CNN mimics the visual perception mechanism of organisms, i.e., when neurons with the same parameters are applied to different positions in the previous layer, a shift-invariant feature can be obtained. It is a type of feedforward neural network that includes convolutional computation, which can perform supervised learning and unsupervised learning [29]. The CNN layers can be classified into the input layer, hidden layers, and output layer. The input data enter the input layer and then enter the hidden layers for calculation. The calculation results of the hidden layers are transmitted to the output layer for output. This is the forward propagation of the signals from input to output. The CNN input layer can process multidimensional data, usually three-dimensional data. The CNN hidden layer contains the convolutional layer, and the pooling layer and the dense layer are optional. The function of the convolutional layer is to extract features from the input data. It contains multiple kernels, and each element of the kernel corresponds to a weight, which is similar to a neuron. For a convolutional layer, data processing requires two processes, first convolution and then activation. In the convolution calculation, the kernels periodically scan the input data and perform matrix element multiplication and summation on the input data in the receptive field. The receptive field size depends on the kernel size. Figure 2 shows an example of convolution with a stride of 1. Each kernel has a bias, and then each convolutional layer has an activation function. After convolution, the input data must be added to the bias and then calculated by the activation function to obtain the output data of the convolutional layer. Assuming that the kernel size is

p \times q

, the output of a single kernel can be written as:

O_{x, y} = f (\sum_{i}^{p \times q} w_{i} I_{i} + b),

(1)

where

I

is the input data,

w

is the weight,

b

is the bias,

f

is the activation function,

O

is the output data, and

x, y

is the spatial coordinate of the output data. The output data mentioned here are also called the feature map. In the case of n kernels, the obtained feature map is expanded to n dimensions, representing the n features extracted from the input data.

In a CNN, the samples are the collection of features and can be labeled or unlabeled. The labels are the true conclusions of the predicted properties and can be understood as the correct answers. The features are the inherent characteristics of things that can be understood as the basis for obtaining true conclusions, and they are usually used as input variables for the neural network model. Labeled samples are used for model training, and unlabeled samples are used for model prediction. The output data in a good neural network model should be as close as possible to the labels corresponding to the input features. For different types of data, different loss functions are usually set to measure the errors of the output signals and labels. The training process of a CNN is then to find a set of

w

and

b

that minimizes the loss function. For supervised learning, the error back propagation method, which was proposed by LeCun in 1989 [28], is the most common optimization algorithm. The error is propagated backward from the output layer until it reaches the input layer. In this process, the gradient of

w

and

b

in the network is calculated using the chain rule and changes in the opposite direction of the gradient. This is called gradient descent optimization [32]. The process of updating parameters by the gradient descent method can be written as:

w^{(n)} = w^{(n)} - α \frac{\partial E}{\partial w^{(n)}},

(2)

and

b^{(n)} = b^{(n)} - α \frac{\partial E}{\partial b^{(n)}},

(3)

where

E

is the error calculated by the loss function and α is the learning rate. The learning rate is used to control the range of weight adjustment. If the learning rate is set too small, the model converges slowly, and if it is set too large, it may cross the optimal point during training. When the training process is complete, the resulting CNN should be able to predict the unlabeled samples.

We can then use the trained model to achieve our physical goals. We want to use the CNN to output the strength of the quadrupoles, which corresponds to the input

Δ R

. Figure 3 shows a schematic illustration of the machine-learning-based method for linear optics calibration. More details on the experimental methods are discussed in the next section. The machine-learning-based method actually establishes the nonlinear mapping relationship between the

Δ R

and the quadrupole errors. The method is similar to a “black box”. In this way, the strength of quadrupoles can be directly predicted from the measured response matrix data. Unlike the traditional LOCO method, the new method does not require SVD calculation and multiple iterations.

2.2. Experimental Methods on Machine Learning

2.2.1. Building the CNN Model

Our machine learning algorithm is programmed using Python. Keras is used as a deep learning module, and it is a high-level API based on TensorFlow [33,34]. TensorFlow, which is based on dataflow programming, is widely used in programming various machine learning algorithms.

After selecting the machine learning module, we started to build the neural network model. We choose the root mean square error as the loss function (RMSE) and the mean absolute error (MAE) as the metric function. The metrics function is optional. It is not directly involved in model optimization but can be used to describe model performance. Using MAE as the metrics function can well characterize the deviation between the predicted value and the label. The model is trained using the error backpropagation method with the Amsgard optimizer. Amsgard is a variant of the Adam optimizer [35] proposed by S.J. Reddi, S. Kale, and S. Kumar. By analyzing the convergence proof of the Adam optimizer, they find that there are errors in the updated rules that can cause the algorithm to converge to a suboptimal point. They designed theoretical experiments to show Adam’s failure scenarios and proposed their Amsgard method as a solution [36].

Our network is used to obtain the corresponding quadrupole errors after the input response matrix, so the input data become the response matrix and the output data become the quadrupole errors. In SSRF, the response matrix has 276 rows and 160 columns, including four small 138 × 80 matrices: XX, XY, YX, and YY. XY and YX are coupling terms and are not valid in the quadrupole calibration. Therefore, we remove XY and YX from the response matrix, leaving the XX and YY matrices, and transform them into a new matrix with 276 rows and 80 columns. This becomes the input data we used. For the output data, there are 206 quadrupoles in SSRF, and they are classified into 45 families. Therefore, the output data are the errors of 45 families of quadrupoles, which form a vector whose elements correspond to the error of each set of quadrupoles. The topological structure of neural networks is particularly important. The deeper neural networks with more hidden layers and the wider neural networks with more kernels in each layer can generally fit the training data better. However, a large model is prone to overfitting and requires more computational resources with a longer training time. We test different neural network structures, considering both accuracy and convergence speed. Finally, we choose a 10-layer CNN with 8 kernels, and the size of the kernel is set to [3,3]. As we need a vector in the output layer, the data must go through a flattening layer after the convolutional layers. The final output layer is a dense layer, meaning that all neurons in the previous layer are connected to all neurons in the next layer. We use the rectified linear unit function as the activation function to simulate the process of neurons responding according to signal strength [37]. The rectified linear unit function is defined as follows:

f (x) = m a x (0, x),

(4)

The rectified linear unit function has the characteristics of unilateral suppression and a relatively wide excitation boundary. Rectifying neurons can produce sparse representations with true zeros and have shown equal or better performance than hyperbolic tangent networks [38]. In our model, its learning speed is faster than other activation functions, and the effect is also better. The next step is to initialize the weights of the neural networks. The purpose of weight initialization is to prevent gradient explosion or gradient disappearance in the forward propagation process of deep neural networks. Gradient explosion or gradient disappearance leads to a very large or very small derivative of the neural networks in backpropagation training, which makes the weight very large or unable to be updated. We use the normal distribution to initialize the weights, which improves the convergence speed and performance of our model. Figure 4 shows the construction of our CNN.

2.2.2. Training Data Acquisition and Preprocessing

To train the neural network, training data are crucial, and the quality of the data determines the model performance. Using the Accelerator Toolbox (AT) [39], we can compute the orbit response matrix corresponding to the lattice. By changing the strength k of 45 families of quadrupoles based on the model lattice, we can obtain the lattice with errors to further obtain the orbit response matrix corresponding to the errors. We choose the error strength from 1000 parts per million to 5000 parts per million (m⁻²) with a total of 50,000 seeds to simulate the error of the real machine.

The data cannot be used directly. After removing the outlier data, they need to be preprocessed. The data collected are grouped by the matrix elements and the families of quadrupole. Therefore, there are 22,080 groups of matrix element data and 45 groups of quadrupole data. Each group of data must be a standard scaler to improve the learning efficiency and performance of the CNN. The formula for the standard scaler is as follows:

x^{'} = \frac{x - μ}{σ},

(5)

where

μ

is the mean of the samples and

σ

is the standard deviation of the samples. Through standard scaling, each feature can be scaled to the state of mean 0 and variance 1, and the distribution of the original data is not changed. Therefore, the features between different groups are numerically comparable. In this way, all matrix elements can be reduced to the same range to train the model, reduce the influence of large variance data, improve the generalization ability of our model, and optimize the training results.

3. Results

A single training session on an RTX3080 GPU took approximately two and a half hours. However, once training was complete, the model output prediction results immediately after data input. The 50,000 datasets were used as training data, and we used a similar method to generate another 10,000 datasets as test data. The test data were not used for training but only for validation. Figure 5 shows the relationship between the MAE of the training and test data and the number of epochs, and the effects of different learning rates were compared. Epoch refers to the number of times the dataset was fully trained at one time during the training process. The figure shows that when the learning rate is 0.003, the MAE of the training data and test data do not converge. When the learning rate is reduced to 0.0015 or 0.0005, the MAE of the training data and test data converge and is basically stable after 100 training epochs. When the learning rate is reduced to 0.0005, the MAE converges more slowly. Therefore, in our neural networks, the learning rate was finally set to 0.0015. In addition, no obvious overfitting phenomenon was observed.

The mean absolute error of the model output is 0.11, corresponding to 150 epochs. After removing the standard scaler, the mean absolute error of the quadrupole strength error k is 3.6 × 10⁻⁴ m⁻². This means that

{(|Δ k^{'} - Δ k|)}_{m e a n}

=3.6 × 10⁻⁴ m⁻², where

Δ k^{'}

is the quadrupole strength error predicted by machine learning and

Δ k

is the quadrupole strength error in the initial lattice. The value of

{(|Δ k|)}_{m e a n}

is 2.4 × 10⁻³ m⁻². Therefore, the average quadrupole strength error is reduced to approximately 15% of the initial error by machine learning.

3.1. Sample Analysis

We took a sample for analysis. The orbit response matrix is shown in Figure 6. Before the correction, the peak–peak values of

Δ R

are 4.980 m/rad (x) and 4.260 m/rad (y), and the RMS values of

Δ R

are 0.791 m/rad (x) and 0.506 m/rad (y). After the correction, the peak–peak value is reduced to 0.436 m/rad (x) and 0.388 m/rad (y), and the RMS value is reduced to 0.066 m/rad (x) and 0.061 m/rad (y). For our lattice, the designed working points are

ν_{x}

= 22,222 and

ν_{y}

= 12,153. Before the correction, the lattice with the field error has working points

ν_{x}

= 22,263 and

ν_{y}

= 12,143. After the correction, the working point is restored to

ν_{x}

= 22,224 and

ν_{y}

= 12,152. The peak-to-peak value and the standard deviation of

Δ R

are significantly reduced after the correction. Beta-beating is shown in Figure 7. The figure shows that the beta-beating value decreases rapidly after the correction by machine-learning-based methods. The machine-learning-based method produces an accelerator model with beta-beating below the 1% level with accurate betatron tune correction. This result is similar to the result calculated by the traditional LOCO algorithm. Using the traditional LOCO algorithm, the beta-beating can reach about 1% after correction [13].

To explore the influence of multiple iterations on the results, we iterated the algorithm five times to observe the changes in the results. In the five iterations, the peak-to-peak value and standard deviation variation in

Δ R

are shown in Figure 8, where the ordinate is the logarithmic coordinate. The figure shows that the result after the first correction greatly improves, and the subsequent iteration slightly improves. Multiple iterations in practical operation require multiple measurements of the response matrices, which takes more time.

3.2. Test Data Analysis

Next, we plotted the peak-to-peak value and standard deviation of the nearly 10,000 groups of the test data that did not participate in training in Figure 9 and Figure 10. We then computed the correction results after one iteration of the machine learning program and plotted them in the same figure. The figure shows that after the correction, the peak-to-peak value of most data is reduced to less than 1 m/rad and the standard deviation of most data is reduced to less than 0.2 m/rad. The correction effect of the machine learning program is obvious.

4. Discussion and Conclusions

This paper investigated the feasibility of linear optics calibration using a machine-learning-based method. We used Python and Keras to create a CNN that can predict the quadrupole errors. Unlike traditional SVD-based LOCO, the machine-learning-based method does not require SVD calculations.

Our study shows that the machine-learning-based method can significantly reduce the difference between the model response matrix and the measurement response matrix by calibrating the strength of quadrupoles. By using the machine-learning-based method, the peak-to-peak value and standard deviation of

Δ R

can be reduced to less than 1 m/rad and 0.2 m/rad in most cases. As machine-learning-based methods continue to improve, it will be possible to achieve greater accuracy in quadrupole calibration. Our method provides a new option for linear optics calibration of a storage ring.

Author Contributions

Conceptualization, R.L. and B.J.; methodology, R.L.; software, R.L.; validation, R.L., B.J. and Q.Z.; formal analysis, R.L.; investigation, R.L., C.L. and K.W.; resources, Z.Z.; data curation, R.L., C.L. and K.W.; writing—original draft preparation, R.L.; writing—review and editing, R.L., Q.Z., B.J., Z.Z., C.L. and K.W.; project administration, Z.Z.; funding acquisition, B.J. and Q.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research and the article processing charge were funded by National Natural Science Foundation of China (no. 11975298) and the Youth Innovation Promotion Association of CAS (no. 2020287).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data are available on request to the authors.

Acknowledgments

We acknowledge support from the Shanghai Synchrotron Radiation Facility, Shanghai Advanced Research Institute, Shanghai Tech University, and Shanghai Institute of Applied Physics.

Conflicts of Interest

The authors declare no conflict of interest.

References

Xie, H.; Deng, B.; Du, G.; Fu, Y.; He, Y.; Guo, H.; Peng, G.; Xue, Y.; Zhou, G.; Ren, Y. X-ray biomedical imaging beamline at SSRF. J. Instrum. 2013, 8, C08003. [Google Scholar] [CrossRef] [Green Version]
Tsakanov, V. Synchrotron Light Facilities and Applications in Life Sciences. In Biomarkers of Radiation in the Environment: Robust Tools for Risk Assessment; Springer: Berlin/Heidelberg, Germany, 2022; pp. 25–36. [Google Scholar] [CrossRef]
Jiang, B.C.; Lin, G.Q.; Wang, B.L.; Zhang, M.Z.; Yin, C.X.; Yan, Y.B.; Tian, S.Q.; Wang, K. Multi-bunch injection for SSRF storage ring. Nucl. Sci. Tech. 2015, 26, 050101. [Google Scholar] [CrossRef]
Zhang, Q.L.; Jiang, B.C.; Tian, S.Q.; Zhou, Q.G.; Zhao, Z.T. Study on beam dynamics of a Knot-APPLE undulator proposed for SSRF. In Proceedings of the IPAC15, Richmond, VA, USA, 3–8 May 2015; pp. 1669–1671. [Google Scholar]
Jiang, B.C.; Zhao, Z.T.; Liu, G.M. Study of Touschek lifetime in SSRF storage ring. HEP & NP 2006, 30, 693–698. [Google Scholar]
Leemann, S.; Liu, S.; Hexemer, A.; Marcus, M.; Melton, C.; Nishimura, H.; Sun, C. Demonstration of Machine Learning-Based Model-Independent Stabilization of Source Properties in Synchrotron Light Sources. Phys. Rev. lett. 2019, 123, 194801. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Jiang, B.C.; Xia, G.X.; Han, L.F.; Liu, G.M.; Dai, Z.M.; Zhao, Z.T. Investigation of fast ion instability in SSRF. Nucl. Instrum. Methods Phys. Res. Sect. A 2010, 614, 331–334. [Google Scholar] [CrossRef]
Bu, L.S.; Zhao, Z.T.; Yin, L.X.; Du, H.W. Vibration control research for the 3rd generation synchrotron light source storage ring mechanical components. Chin. Phys. C 2008, 32, 37–39. [Google Scholar]
Zhao, Z.T.; Xu, H.J.; Ding, H. Commissioning of the Shanghai Light Source. In Proceedings of the PAC09, Vancouver, BC, Canada, 4–8 May 2009; pp. 55–59. [Google Scholar]
Zhao, Z.T.; Xu, H.J. Operational Status of the Shanghai Synchrotron Radiation Facility. In Proceedings of the IPAC10, Kyoto, Japan, 23–28 May 2010; pp. 2421–2423. [Google Scholar]
Zhao, Z.T.; Yin, L.X.; Leng, Y.B.; Jiang, B.C.; Tian, S.Q. Consideration on the future major upgrades of the SSRF storage ring. In Proceedings of the IPAC15, Richmond, VA, USA, 3–8 May 2015; pp. 1672–1674. [Google Scholar]
Husain, R.; Prakash, S.; Ghodke, A. Betatron coupling measurement and optimization in Indus-2 storage ring. Rev. Sci. Instrum. 2021, 92, 053302. [Google Scholar] [CrossRef]
Zhang, M.Z.; Chen, J.H.; Tian, S.Q.; Li, H.H.; Liu, G.M.; Li, D.M. Linear optics correction based on LOCO at SSRF storage ring. High Power Laser Part. Beams 2009, 21, 1893. [Google Scholar]
Jinyu, W.; Zheng, S.; Xiang, Z.; Yu, B.; Chengying, T.; Paul, C.; Senlin, H.; Yi, J.; Yongbin, L.; Biaobin, L. Machine learning applications in large particle accelerator facilities: Review and prospects. High Power Laser Part. Beams 2021, 33, 094001. [Google Scholar] [CrossRef]
Kaiser, J.; Stein, O.; Eichler, A. Learning-based optimisation of particle accelerators under partial observability without real-world training. In Proceedings of the International Conference on Machine Learning, Xiamen, China, 27–29 May 2022; pp. 10575–10585. [Google Scholar]
Wang, F.Y.; Song, M.; Edelen, A.; Huang, X. Machine learning for design optimization of storage ring nonlinear dynamics. arXiv 2019, arXiv:1910.14220. [Google Scholar]
Lu, Y.; Leemann, S.; Sun, C.; Ehrlichman, M.; Nishimura, H.; Venturini, M.; Hellert, T. Demonstration of machine learning-enhanced multi-objective optimization of ultrahigh-brightness lattices for 4th-generation synchrotron light sources. Nucl. Instrum. Methods Phys. Res. Sect. A 2023, 1050, 168192. [Google Scholar] [CrossRef]
Ivanov, A.; Agapov, I. Physics-based deep neural networks for beam dynamics in charged particle accelerators. Phys. Rev. Accel. Beams 2020, 23, 074601. [Google Scholar] [CrossRef]
Wan, J.; Jiao, Y. Machine learning enabled fast evaluation of dynamic aperture for storage ring accelerators. New J. Phys. 2022, 24, 063030. [Google Scholar] [CrossRef]
Safranek, J. Experimental determination of storage ring optics using orbit response measurements. Nucl. Instrum. Meth. A 1997, 388, 27. [Google Scholar] [CrossRef]
Safranek, J. Matlab-based LOCO. In Proceedings of the EPAC02, Paris, France, 3–7 June 2002. [Google Scholar]
Tomás, R.; Aiba, M.; Franchi, A.; Iriso, U. Review of linear optics measurement and correction for charged particle accelerators. Phys. Rev. Accel. Beams 2017, 20, 054801. [Google Scholar] [CrossRef] [Green Version]
Shun-Qiang, T.; Wen-Zhi, Z.; Hao-Hu, L.; Man-Zhou, Z.; Jie, H.; Xue-Mei, Z.; Gui-Min, L. Linear optics calibration and nonlinear optimization during the commissioning of the SSRF storage ring. Chin. Phys. C 2009, 33, 83. [Google Scholar] [CrossRef]
LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef]
McCulloch, W.S.; Pitts, W. A logical calculus of the ideas immanent in nervous activity. Bull. Math. Biophys. 1943, 5, 115–133. [Google Scholar] [CrossRef]
Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; MIT Press: Cambridge, MA, USA, 2016. [Google Scholar]
Waibel, A.; Hanazawa, T.; Hinton, G.; Shikano, K.; Lang, K.J. Phoneme recognition using time-delay neural networks. IEEE Trans. Acoust. Speech Signal Process. 1989, 37, 328–339. [Google Scholar] [CrossRef]
LeCun, Y.; Boser, B.; Denker, J.S.; Henderson, D.; Howard, R.E.; Hubbard, W.; Jackel, L.D. Backpropagation applied to handwritten zip code recognition. Neural Comput. 1989, 1, 541–551. [Google Scholar] [CrossRef]
Gu, J.; Wang, Z.; Kuen, J.; Ma, L.; Shahroudy, A.; Shuai, B.; Liu, T.; Wang, X.; Wang, G.; Cai, J. Recent advances in convolutional neural networks. Pattern Recogn. 2018, 77, 354. [Google Scholar] [CrossRef] [Green Version]
Wu, Z.; Pan, S.; Chen, F.; Long, G.; Zhang, C.; Philip, S.Y. A comprehensive survey on graph neural networks. IEEE Trans. Neural Netw. Learn. Syst. 2020, 32, 4–24. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Zhang, Y.-D.; Satapathy, S.C.; Guttery, D.S.; Górriz, J.M.; Wang, S.-H. Improved breast cancer classification through combining graph convolutional network and convolutional neural network. Inf. Process. Manag. 2021, 58, 102439. [Google Scholar] [CrossRef]
Ruder, S. An overview of gradient descent optimization algorithms. arXiv 2016, arXiv:1609.04747. [Google Scholar]
Abadi, M.; Barham, P.; Chen, J.; Chen, Z.; Davis, A.; Dean, J.; Devin, M.; Ghemawat, S.; Irving, G.; Isard, M. Tensorflow: A system for large-scale machine learning. In Proceedings of the OSDI16, Savannah, GA, USA, 2–4 November 2016; pp. 265–283. [Google Scholar]
Kim, S.; Wimmer, H.; Kim, J. Analysis of Deep Learning Libraries: Keras, PyTorch, and MXnet. In Proceedings of the 2022 IEEE/ACIS 20th International Conference on Software Engineering Research, Management and Applications (SERA), Las Vegas, NV, USA, 22–25 May 2022; pp. 54–62. [Google Scholar] [CrossRef]
Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
Reddi, S.J.; Kale, S.; Kumar, S. On the convergence of adam and beyond. arXiv 2019, arXiv:1904.09237. [Google Scholar]
Nair, V.; Hinton, G.E. Rectified Linear Units Improve Restricted Boltzmann Machines Vinod Nair. In Proceedings of the ICML10, Haifa, Israel, 21–24 June 2010; pp. 807–814. [Google Scholar]
Glorot, X.; Bordes, A.; Bengio, Y. Deep Sparse Rectifier Neural Networks. J. Mach. Learn. Res. 2011, 15, 315–323. [Google Scholar]
Terebilo, A. Accelerator modeling with MATLAB accelerator toolbox. In Proceedings of the PAC01, Chicago, IL, USA, 18–22 June 2001; pp. 3203–3205. [Google Scholar]

Figure 1. Schematic illustration of a neuron.

Figure 2. An example of convolution with a stride of 1.

Figure 3. Schematic illustration of the machine-learning-based method for a linear optics calibration.

Figure 4. Schematic illustration of the construction of our CNN.

Figure 5. Mean absolute errors of the training and test data.

Figure 6. The orbit response matrix

Δ R

. (a,b) are the x-direction, (c,d) are the y-direction, and (a,c) are before the correction, (b,d) are after the correction.

Figure 6. The orbit response matrix

Δ R

. (a,b) are the x-direction, (c,d) are the y-direction, and (a,c) are before the correction, (b,d) are after the correction.

Figure 7. Beta-beating of the lattice. (a) is before the correction and (b) is after the correction.

Figure 8. The peak-to-peak value and standard deviation variation in

Δ R

after multiple iterations. (a) is the peak-peak value and (b) is the standard deviation of

Δ R

.

Figure 8. The peak-to-peak value and standard deviation variation in

Δ R

after multiple iterations. (a) is the peak-peak value and (b) is the standard deviation of

Δ R

.

Figure 9. The peak-to-peak value of

Δ R

. Each point in the figure represents a response matrix. (a,b) are in the x-direction, (c,d) are in the y-direction, and (b,d) are the blowups of (a,c).

Figure 9. The peak-to-peak value of

Δ R

. Each point in the figure represents a response matrix. (a,b) are in the x-direction, (c,d) are in the y-direction, and (b,d) are the blowups of (a,c).

Figure 10. The standard deviation of

Δ R

. Each point in the figure represents a response matrix. (a,b) are in the x-direction, (c,d) are in the y-direction, and (b,d) are the blowups of (a,c).

Figure 10. The standard deviation of

Δ R

. Each point in the figure represents a response matrix. (a,b) are in the x-direction, (c,d) are in the y-direction, and (b,d) are the blowups of (a,c).

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Li, R.; Jiang, B.; Zhang, Q.; Zhao, Z.; Li, C.; Wang, K. Linear Optics Calibration in a Storage Ring Based on Machine Learning. Appl. Sci. 2023, 13, 8034. https://doi.org/10.3390/app13148034

AMA Style

Li R, Jiang B, Zhang Q, Zhao Z, Li C, Wang K. Linear Optics Calibration in a Storage Ring Based on Machine Learning. Applied Sciences. 2023; 13(14):8034. https://doi.org/10.3390/app13148034

Chicago/Turabian Style

Li, Ruichun, Bocheng Jiang, Qinglei Zhang, Zhentang Zhao, Changliang Li, and Kun Wang. 2023. "Linear Optics Calibration in a Storage Ring Based on Machine Learning" Applied Sciences 13, no. 14: 8034. https://doi.org/10.3390/app13148034

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Linear Optics Calibration in a Storage Ring Based on Machine Learning

Abstract

1. Introduction

2. Materials and Methods

2.1. Theory of Linear Optics Calibration

2.1.1. Classical Theory

2.1.2. Machine Learning Theory

2.2. Experimental Methods on Machine Learning

2.2.1. Building the CNN Model

2.2.2. Training Data Acquisition and Preprocessing

3. Results

3.1. Sample Analysis

3.2. Test Data Analysis

4. Discussion and Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI