1. Introduction
The nuclear charge radius is one of the most fundamental physical quantities to describe nuclear properties. By studying the nuclear charge radii, information such as nuclear charge density, Coulomb potential of nuclei [
1], the properties of nuclear force, shell structure [
2,
3], halo structure and neutron radii or skins [
4,
5,
6] could be obtained. The RMS charge radii of stable nuclei can be measured by electron scattering [
7] and muonic atom X-rays [
8] experiments, while the experimental information on charge radii of radioactive nuclei is derived from mean square (MS) charge radii changes
〉 through comparing the
isotopic shift (
) [
9] and the isotopic shift of optical spectral lines of two isotopes of the same element. With the development of radioactive ion beams (RIB), nuclei far away from the
-stability line have attracted a lot of interest from nuclear physicists. The charge radii of exotic nuclei can be extracted by charge-changing cross-sections [
10,
11]. In recent years, the number of unstable nuclei whose RMS charge radii are measured by the laser spectroscopy experiments have increased significantly, and the accuracy of measured results has been improved [
12]. Accurate results of charge radii are also conducive to some studies in atom physics and astrophysics. It is therefore interesting to find ways to make more accurate predictions for the RMS charge radii.
Nuclear charge radius is defined as
where
is the RMS nulcear charge radius. The nucleus is regarded as an incompressible drop with an equilibrium density in liquid drop model (LDM). Because of the uniformity, the RMS charge radius is given by
where
= 1.2247 fm [
13] and
A is the mass number. It is rough for light nuclei [
14] and those away from
-stability. In Ref. [
15], a formula is given as follows:
where
= 1.2347 fm,
a = 0.1428, and
b = 2.0743 [
13]. Although considerable progresses in compensating the weakness of Equation (
2) have been made by Equation (
3), it describes an approximately linear relation between charge radii and neutron numbers for an isotopic chain. To reflect the shell effects and odd-even staggering on charge radii, Ref. [
13] fitted the data with
from Ref. [
16] and gave two new formulas:
where
= 1.2320 fm,
a = 0.1529,
b = 1.3768,
c = 0.4286 [
13] and
P denotes the Casten factor [
17];
where
= 1.2321 fm,
a = 0.1534,
b = 1.3358,
c = 0.4317,
d = 0.1225 [
13] and
is dependent upon the odevity of proton and neutron numbers. In addition, nuclear charge radii could be calculated with Hartree–Fock-Bogoliubov (HFB) [
18] and Relativistic Mean Field [
19,
20] theory. The success of these theories has been witnessed on validating more and more accuracy measured values of charge radii, even if these theoretical models are not confined in extracting nuclear radii.
Machine learning (ML) has the congenital advantage of processing enormous data, which makes it successfully and increasingly applied to nuclear physics [
21,
22] and particle physics [
23]. Ref. [
21] provides a snapshot of ML in nuclear physics including nuclear theory, experimental methods, accelerator technology and nuclear data. As a fundamental observable in nuclear theory, it is necessary to be able to accurately calculate the charge radii of nuclei that have not been measured, which is called data mining based on ML. There have been several ML models that are effectively applied to depict and predict nuclear charge radii [
24,
25,
26,
27,
28,
29,
30,
31]. These applications are also not limited to reducing the RMS deviation of the data set, but also to reproducing and extrapolating the charge radii of isotopic chains whose increasing trends with the neutron number could reflect some underlying physical information. The calcium isotopic chain is often discussed because of the nearly equal charge radii of
Ca and
Ca, the apparent odd-even effects between the two isotopes, and the significantly increased radii of the isotopes after the neutron magic number
N = 28. ML has been used to the study of multiple radionuclide identification in nuclear safety [
32] and other nuclear properties such as nuclear mass [
33,
34],
-decay rate [
35] as well as fission yields [
36] and so on. On the one hand, this article aims to provide a detailed and methodical review of ML for charge radii. On the other hand, CNNs offer significant advantages in image processing due to the spatial structure of the images considered [
37,
38]. The CNN is also favored by researchers in the field of deep learning because of its idea of shared weights and biases and the advantage of translation invariance. We also introduce the CNN approach to directly calculate RMS charge radii
, and intend to achieve better predictions when more nuclear physics quantities are input into models.
This article is organized as follows. In the next section (
Section 2), a systematic review of the ML for charge radii is given. We then briefly introduce the CNNs and explain how we apply CNNs to the study of nuclear radii in
Section 3.
Section 4 is the results and discussions, and a brief conclusion is contained in
Section 5.
2. Machine Learning for Nuclear Charge Radii
Particularly, the charge radius is a fundamental physical quantity that reflects the size of the nucleus, but the application of ML methods on it is increasing with new experimental data that are constantly updated. In 2013, S. Akkoyun et al. had already started using neural network methods for charge radius studies [
24]. They tried a feed-forward artificial neural network (ANN) with
Z and
N as input neurons and charge radii as output neurons, and gave a RMS deviation of 0.025 fm between the experimental charge radii and the results of ANN for 20% test data from 900 nuclei. However, it is also seen that the performance on the light nuclei is much worse than that of nuclei with
. The new mass-dependent formula
was obtained by least-squares fitting of the ANN outputs. When this formula is used as a parameter of the harmonic oscillator basis in the HFB model, the calculated ground-state properties of the Sn isotopes are in good agreement with the experimental values, which provides a good example for combining ML with theoretical models.
In Ref. [
25], a Bayesian Neural Network (BNN) is used to learn the residuals between the experimental data and theoretical predictions. The input of the neural network contains only
Z and
A. As the combination of the ANN and Bayesian theorem, in regressions, the BNN defines the model for the conditional distribution of the outputting values given a set of inputting values. The prior distributions for the parameters of the BNN are set as Gaussian distributions with zeros as the mean and with the variances determined by hyperparameters, which are given Gamma distributions. The Gaussian-form likelihood function naturally combines the raw residuals and the outputs with the experimental errors as the variance term. The raw residuals between experimental values and predictions calculated by the extended-liquid-drop model formula, Equation (
3), as well as three relativistic energy density functionals, NL3, FSUGold and FSUGarnet, are refined individually and compared, which greatly expands the range of applying ML for the nuclear theory. The data set of experimental charge radii consists of 820 nuclei with
and
. Although the extrapolation and interpolation results are improved at least 28% and 42%, respectively, for the entire data set after BNN refinement, this BNN method is overwhelmed for the reproduction of Ca isotopic chain.
In Ref. [
26], ANN is used to learn the experimental data of nuclear charge radii, in which the input is extended to include the proton number, the neutron number, the electric quadrupole transition strength B(E2) from the first excited
state to the ground state, and the symmetry energy. Although the total number of nuclei is only 347, the predictions of Ca isotopes are evidently improved with the symmetry energy part included. The underlying correlation between the symmetry energy and the charge radii of Ca isotopes is confirmed by HFB calculations with Skyrme interactions, thereby confirming the reliability of ML.
A naive Bayesian Probability Classifier is trained to tune the nuclear charge radii predicted by the Skyrme-HFB model and the semi-empirical formula in Equation (
4), in Ref. [
27]. The classification table is made by dividing the raw residuals of nuclear charge radii into 10 intervals. The classification value with the highest probability is exactly the refined value for the raw residual. Ref. [
27] calculates the raw deviations between experimental values and the predicted results by the Skyrme-HFB model and Equation (
4), and obtained a standard deviation
= 0.0196 fm for the validation set in the extrapolation. In the subsection of the NBP refinements for the isotopes in Ref. [
27], even though the changing features of Ca isotopes can be reproduced, these interesting phenomena, such as nearly identical charge radii of
Ca and
Ca, as well as evident odd–even effects for Ca isotopes between
N = 20 to
N = 28, are not completely shown.
In Ref. [
28], BNN is used to learn the residuals between experimental data and the predictions of Equation (
3). Along with
Z and
A, two more terms are introduced as input, i.e.,
and
P. By these two items, nuclear pairing effects and the shell closure effects are involved. The study achieves the RMS deviation of 0.0149 fm for the entire set in the medium and heavy mass regions. When extrapolating the charge radii of Ca isotopes, the BNN fails to predict the odd–even staggering for the nuclei with
, although the relatively good performance in potassium isotopes can not be ignored.
Ref. [
29] defines the distance between two nuclei by calculating the Euclidean norm in the
plane. The Kernel ridge regression (KRR) model with a Gaussian kernel is applied to reconstruct the differences between experimental values and calculated results by six phenomenological formulae, and the RMS deviations are improved to about 0.017 fm at a whole level. It should be further explained that these six formulae are not all those introduced in the
Section 1, but also include
,
formulae and the formula with the quadrupole deformation, which can be found in the corresponding reference.
In Ref. [
30], ANN is used to predict the parameters
c and
z of a two-parameter Fermi(2pF) distribution, which is assumed for the nuclear charge distributions. Two kinds of inputs
and
are used. The accuracy and precision of the parameter-learning effect are improved by introducing
, in the latter case. However, the RMS deviation between the experimental charge radii and the result of 2pF with parameters tuned by the ANN is 0.07693 fm, which is limited by the form of the 2pF model.
Following the achievements of Ref. [
28], the same team goes on to apply BNN to improve the residuals of charge radii of the medium and heavy nuclei in Ref. [
31]. In contrast to Ref. [
28], an isospin-dependent term and the artificial binary encoding of
Hg with strong odd–even staggering, are added to the input, respectively, and the RMS deviation of the testing set 0.0139 fm is achieved. The extrapolation capability of BNNs with four, five and six features as inputs are compared by calculating the RMS deviations of the test data in terms of different mass numbers, extrapolation distances and isospin symmetry
, which confirms the importance of minimizing model distortion with the manual handling of abnormal data. The BNN with six quantities inputted can perform perfect extrapolation in the proton-rich region of thallium isotopes and the proton-rich and neutron-rich regions of calcium isotopes.
Table 1 compares these ML methods and their results, where
denotes the RMS deviations of interpolation, while
is the RMS deviations of extrapolation. In general, when evaluating charge radii, the interpolation is done by selecting a random portion of the entire data set as the test set and the rest as the training set. The extrapolated data, on the other hand, are selected according to the chronological compilation of the nuclear radius experimental data, with the previous data being used for training and the later updated ones for testing. These data were obtained from Refs. [
12,
16,
39], and the compared data for the other methods in
Table 1 are from Refs. [
16,
39], except for the extrapolation in Refs. [
28,
31], which used data from Refs. [
12,
16,
40]. A new division of test sets aiming at long isotopic chains as extrapolating is proposed in Ref. [
29], which is the six most neutron-rich nuclei are classified into six test sets determined by the extrapolating distances to the nearest isotopes in the train set. It should be noted that the concept of input and output in the NBP classifier method is actually inappropriate, but the physical quantities are grouped in
Table 1 according to the categories divided by the residuals of charge radii and the classification process in Ref. [
27].
3. CNN Method
Data features can be efficiently extracted by CNN, which is why we use the CNN method. A typical CNN consists of convolution layers, pooling layers and fully connected layers. We want the input and output images to have the same pixel size, so only convolutional layers are used in the CNNs we construct. A normal convolutional layer requires a three-dimensional arrangement of neurons, channel × height × width (
), as input. To illustrate the convolutional layer as an example, refer to
Figure 1, where the input of size 3 × 5 × 5 is mapped to the hidden layer of size 2 × 3 × 3. Each neuron in one channel of the hidden layer is connected to a 3 × 3 × 3 region of the input neurons, corresponding to 9 pixels in 1 input channel. That region in the input images is called the local receptive field for the hidden neuron. The convolution means starting with a local receptive field in the top-left corner, then sliding the local receptive field over by one pixel (called stride length) to the right to connect with the second hidden neuron, and so on, crossing all the input images building up the hidden layer. Something that needs mentioning is that the same weights and biases are used for each of the 3 × 3 hidden neurons, which are called shared weights and biases. In practical calculations, for the
th hidden neuron in one channel, the output is expressed by:
Here, f is the neural activation function—as with the ReLU function we use. b is a 3 × 1 × 1 array of shared biases, while is a 3 × 3 × 3 array of shared weights. Additionally, denotes the input pixel value at position . The shared weights and bias are defined as a kernel or filter. A simple convolutional layer can be implemented by defining the number of input and output channels and the size of the kernel or filter.
In our work, CNNs are prone to achieve high accuracy results, while the difficulty of prediction increases. Therefore, selecting the appropriate neural network structures and network inputs is one of the major tasks of our work.
Figure 2 shows two kinds of network structures, labeled C1 and C2, respectively. C1 is very common in deep learning, and it is actually a convolutional network containing a residual block. For example, 36, 3 × 3 conv1 in C1, the description is as follows: we use conv1 to indicate the current convolutional layer, then 36 and 3 × 3 denote the number of output channels and the kernel size of this layer, respectively. The structure of C2 is borrowed from deep convolutional networks that achieve image super-resolution [
41]. CNNs are employed to address images, so the 6 × 102 × 158 matrix diagram is filled in with the 6 physical quantities (
) of nuclei according to the layout shown in
Figure 3. It should be added that when filling in the matrix about
, we use the experimental values for nuclei with experimental charge radii, and the values of
are calculated by Equation (
3) for those that have not been measured. The data set of the binding energy per nucleon
is taken from Ref. [
42].
P is called a Casten factor [
17], and is defined by
where
and
present the number of valence protons and valence neutrons, respectively, and they are counted from the nearest closed shell. In this work, the proton and neutron magic numbers are taken as
Z = 2, 8, 20, 28, 50, 82 and
N = 2, 8, 20, 28, 50, 82, 126.
I is named relative neutron excess [
15], and is given by
which is associated with the isospin.
We hope to establish a connection between the charge radius of one nucleus and the physical quantities associated with itself and the surrounding nuclei in such isotope matrices. Naturally, it is not necessary to consider
Cm when the charge radius of
Li is calculated. We want the size of CNNs’ output to be consistent with that of the input, and the value of the
Zth row and
Nth column on the output image represents the charge radius of the nucleus with the proton number
Z and the neutron number
N. Based on the defined convolution layers, it can be inferred which part of data on the isotope matrices is used in the calculation of one nuclear radius. Thus, for each calculated nucleus, we selected a region of size 13 × 13 with it as the center on the filled isotope matrices as the inputted image of CNNs. As an example in
Figure 3, a 13 × 13 region centered on
O is framed by the green dotted box, which is the inputted image of
O. This division of the input image avoids redundant filtering of the kernel on the images, saving the computational effort. As mentioned before, the experimental values are used to fill in the image of the
channel, so the central values of the
channel are set to zero to ensure that the experimental data are not involved in the calculation of the corresponding nuclei. So far, we have obtained 6 × 13 × 13 input images for each nucleus.
4. Results and Discussion
All the RMS charge radii are from the 2013 compilation [
16], then we pick out the nuclei that do not exist in the 2004 compilation [
39] as the test set when extrapolating. The number of Y and Pb isotopes, for example, has been expanded from 1 to 16, and 23 to 32 in the two compilations, respectively. Overall, 820 nuclei beyond
Ca (
) have been discussed based on BNN in Ref. [
25], while Ref. [
27] introduces the NBP classifier to analyze 896 nuclei with
. The sections can be found in the corresponding references, from which we have selected the best extrapolation results here for a comparison.
The extrapolation property of the CNN method is discussed based on two considered models. In addition, the other kind of input images are also discussed. That is, in the numerical matrix enclosed by the green dashed line in
Figure 4, only the values on the pixels framed by the red dashed line are retained for all channels except the
channel. In order to distinguish, we write down the previously obtained input as Input 1, and this input, which does not take into account the (
) information of the surrounding nuclei, as Input 2. We individually calculate the nuclei beyond
Ca and the nuclei with
, and the results are presented in
Table 2 and
Table 3, respectively. When light nuclei are ignored in
Table 2, the RMS error of the training set of these CNNs is reduced by 0.01 fm compared to the result of the BNN and a similar optimization is obtained on the test set. C1 is more applicable for medium and heavy nuclei comparing these two models. In the calculation of nuclei with
, 897 nuclei are collected in total, but we remove
Li and
Be from the test set. Although it is not reasonable to remove badly predicted nuclei, the absolute errors of them are really large, sometimes over 0.1 fm. There is no lithium isotope in the training data, and only
Be is involved in the learning. When 13 × 13 input region images are segmented, the empty parts are filled with zeros after being placed in their respective center pixels. Therefore, to obtain better results with CNNs, it should be worthwhile to adjust the inputting images of light nuclei. The extrapolating ability can be witnessed when taking 68 low mass nuclei in 786 training data and 7 light nuclei in 109 test data into consideration. The model C2 seems to have a better performance for a broader mass range according to
Table 3. Turning to the commonality of extrapolation between the two mass ranges of our object, from
Table 2 and
Table 3, it can be concluded that both C1 and C2 have better extrapolating abilities when using Input 1, involving in these quantities (
) of the surrounding nuclei, and the extrapolating error of 0.015 fm can be obtained.
The number of the nuclear charge radii has been expanded again in the 2021 complication [
12], where the latest RMS charge radii of 236 nuclei measured by the laser spectroscopy experiment are compiled. Combining the three complications, 1027 charge radii data for
are used in subsequent calculations in aggregate. We randomly choose 80% from these data as training sets and the rest of the nuclear data are naturally classified as test sets. Two CNN models each with two input forms are still discussed in such an interpolation. The results of five random divisions of the data set are listed in
Table 4. It is evident that using Input 2 is able to give better predictions for both C1 and C2, which is different with the performances in extrapolations. C2 has a less-obvious advantage for random predictions.
Figure 5 shows the deviations between the experimental charge radii and the output of C2 with Input 2, and just the results of data division 1 and 2 are presented, which is more explicit about the discrepancy among different random test sets. The data with
and
in test set 1 are really obtrusive, and the predicted error of
Be is even around 0.15 fm. So, the relatively big disparity between the two test sets is originated from these very light nuclei to a large extent. In more detail, the models that have been well trained can make good predictions that are close to the results of learning, when the nearly identical tendency over the different
Z ranges for these four sets is captured.
Figure 6 vividly shows the differences between the experimental data and the calculations of C2 fed by Input 2. The results of the corresponding model are the random interpolating group 2 in
Table 4, and the RMS error of the entire set can be calculated as 0.0108 fm. The majority of the calculated charge radii differ from the experimental values by 0.01 fm. The position distribution of the nuclei with larger errors, labeled by red and black pixels, is similar in the training and test sets, and is mainly concentrated in the edge zones. Hence, when the data of all 1027 nuclei are fed into such a model, it is possible to predict the unknown charge radii well.
To provide a more intuitive understanding of this work and to compare with other ML methods in
Table 1,
Table 5 summarizes our application and results for the CNN method.
C2 with Input 2 has made predictions about the charge radii of several isotope chains for a straightforward perception of the network’s performances. The shell effect is an intriguing nuclear property, and its visualization in charge radii has also been researched [
43,
44]. We choose Sr (
) and Ba (
) isotopic chains, as well as the isotones with
and 118 for validating the prediction of shell effect, and each chain is tested individually. In
Figure 7, the C2 prediction for the charge radii of these nuclei are shown, compared with the corresponding experimental values. The transitions of the charge radii at the point of magic numbers are well reproduced. Almost all nuclei are perfectly reproduced for isotones. After all, when predicting the charge radii of isotones, their isotopes participate in the process of learning. As
Ba, forecast in not only the
chain but also the
chain, it is more difficult to predict a single nucleus whose isotopes are not trained according to
Figure 7. The C2 performs relatively poorly in regenerating the charge radii of the Ba isotopic chain, especially for those nuclei near the drip line areas. Combining the half Sr and Ba chains on the left of neutron magic numbers, the tendency of C2 predictions tends to be a smooth arc. Thus, the model falls slightly to reproduce the left half isotopes of the Ba chain with a slight fluctuation.
Figure 8 sequentially compares the predicted values of the charge radii of the four isotopic chains of Ca (
), Zn (
), Zr (
) and Pb (
) with their experimental data. The predicted outputs of the Zn and Pb isotopes are in good agreement with the experimental values. The trend of the charge radii of these two chains is truly relatively smooth. There is the shape transition [
45] occurring at
Zr (
), which also contributes to the appearance of shape coexistence [
46]. An abrupt increase at the charge radius values from
Zr to
Zr can be seen in
Figure 8. The similar behavior is known to occur at
in the chains of Rb (
), Sr (
) [
47,
48], Y (
) isotopes and at
in the chains of Nd (
), Sm (
), Gd (
), and Dy (
) [
45]. However, C2 are overwhelmed by such a sudden transition according to the predictions of Sr and Zr isotopes in
Figure 7 and
Figure 8, respectively. The poor performances on Ca isotopes are obtrusive. It is a pity that the odd–even staggering between
Ca and
Ca has not reappeared successfully, but the predicted gap between
Ca and
Ca is acceptably small.