Evaluation of Parallel Computing on MPI Version PHITS Code

Gwon, Hyeok-Jun; Hwang, Sun-Boong; Kim, Sangrok; Kim, Kum-Bae

doi:10.3390/app13063782

Open AccessArticle

Evaluation of Parallel Computing on MPI Version PHITS Code

by

Hyeok-Jun Gwon

¹,

Sun-Boong Hwang

¹,

Sangrok Kim

²

and

Kum-Bae Kim

^1,3,*

¹

Department of Radiation Oncology, Korea Institute of Radiological & Medical Sciences, Seoul 01812, Republic of Korea

²

Radiation Safety Section, Korea Institute of Radiological & Medical Science, Seoul 01812, Republic of Korea

³

Research Team of Radiological Physics & Engineering, Korea Institute of Radiological & Medical Sciences, Seoul 01812, Republic of Korea

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2023, 13(6), 3782; https://doi.org/10.3390/app13063782

Submission received: 30 January 2023 / Revised: 8 March 2023 / Accepted: 13 March 2023 / Published: 16 March 2023

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

The Message Passing Interface (MPI) technique is an old solution and an improvement on the Monte Carlo N-Particle Transport (MCNP) method’s enormous computational time, which has not been evaluated based on PHITS code—a recently developed Monte Carlo simulation code. We conducted simulations on Varian Clinac iX 6MV phase space data from the IAEA. Venselaar et al.’s method and criteria were used to validate the Monte Carlo simulation. The PC cluster has also been tested in terms of processor count and bch, which stands for unit calculation count per operation. The MPI version PHITS code’s speedup factor and the K-factor, which represent the serial portion of the cluster, were both evaluated. All calculated data met the criteria except δ2, high dose, and high gradient of the beam profile data set. It was very clear that PC clusters with MPI were better than simple nodes up to 70.6%. Additionally, the speedup factor shows a tendency to follow Amdahl’s law. At the same time, the K-factor was saturated by a certain measure. The study concludes by arguing that the cluster has limitations that come from its serial composition. If we consider how improvements in specifications affect simulation time, this cluster system could be more effective.

Keywords:

PHITS; MCNP; cluster; parallel computing; radiotherapy

1. Introduction

The Monte Carlo method has been utilized to evaluate the dose distribution of complex structures by simulating radiation behavior, and numerous codes have been developed for a variety of applications [1,2,3,4]. The Particle and Heavy Ion Transport Code System (PHITS), recently developed in Japan, offers a variety of functions such as Digital Imaging and Communications in Medicine (DICOM) format file and International Atomic Energy Agency (IAEA) phase space (phsp) data file to PHITS input file convert system as well as a research- or practice-friendly environments, such as radiation therapy research and shielding design [5].

Indeed, there are several efforts for the modeling of linear accelerator (Linac) to assess in-and-out-of-field dose distribution or radiation behavior. Other searches present it as feasible, with exact blueprints provided from those machines’ manufacturers and complicated sequences to derive Linac head. To overcome these problems, IAEA provide a phsp data base which is defined as a collection of representative pseudo-particles emerging from a radiotherapy source, including radioisotope and the enormous type of Linac beam data, along with those particles’ properties such as energy, particle type, position, direction, statistical weight, etc. Unfortunately, this data base is not able to use some MCNP codes. In PHITS, the user is able to utilize it through the tools of PSFC4PHITS, as mentioned before [6,7,8].

However, this also requires a large number of iterative calculations to obtain reliable results, and it is difficult for users with insufficient computing infrastructure to quickly evaluate and provide feedback [9]. While various methods have been proposed and used to solve this problem [10,11,12], there is a burden in some cases, such as writing a parallel processing algorithm in Python [13,14]. On the other hand, the parallel computing method utilizing the Message Passing Interface (hereafter referred to as MPI) is the most fundamental method and was introduced at the outset of performance enhancement efforts for the MCNP. This method is a fundamental technique for achieving an increase in computing power proportional to the number of processors utilized for computations [15].

In a study by J.C. Wagner et al. (1996), the results of parallel computing of MCNP code using MPI depended on the number of processors and Amdahl’s law, which converged to a constant value due to the limitation that parallel processing could not be performed, was applied to these results [15]. Deng (1998) evaluated the performance improvement of parallel computing using MPI, a parallel virtual machine (PVM) based on MCNP, reporting that the method using MPI was more efficient and confirming that the degree of performance improvement in the method using MPI increased linearly with an error of up to 5% for up to 32 processors and an error of 13% when using 64 processors [16]. Usang et al. evaluated the results of MCNP parallel computing with MPI based on the number of processors and provided recommendations for the number of efficient processors required based on the amount of work [17].

As shown in the previous research, studies on the efficiency of the Monte Carlo method using MPI have mainly been conducted with a focus on the MCNP code, with a scarcity of research results on the relatively recently developed PHITS code. For the abovementioned reasons, this study was conducted to verify IAEA phase space data converted to source input file through the PSFC4PHITS tool in the PHITS code package and evaluate the efficiency of the cluster system based on Amdahl’s law.

2. Materials and Methods

2.1. Radiation Source and Water Phantom Setup

In this study, PHITS (version 3.26, Japan Atomic Energy Agency, Japan) was used for Monte Carlo simulation, and phase space data (Nuclear Data Section, IAEA) of Clinac iX (Varian Medical Systems, Palo Alto, CA, USA) 6MV provided by the IAEA was used as the radiation source. As shown in Figure 1, it was set to obtain an absorbed dose by implementing water with a volume of 1 m³ at a distance of 100 cm from the source to the surface, usually known as source-to-surface distance (SSD). This setup was utilized for the calculation of beam profile and percent depth dose (PDD). Those were calculated for 4 cm × 4 cm and 10 cm × 10 cm field sizes at depths of 15 mm, 50 mm, 100 mm, and 200 mm, which was the same as the actual measurement. In the case of the beam profile, the axes of tally cells were laid on the middle of the virtual water at each depth, and it was perpendicular to the beam’s direction. Tally cells for PDD were applied in the same way, but it was horizontal to the beam’s direction. For material components required for the simulation, material information provided by the Pacific Northwest National Lab (PNNL) was used [18].

2.2. MPI Activation

The PHITS code’s MPI function is a structure that brings and uses the CPU source of each computer connected via the local network. When setting up total number of simulation on PHITS, terminology such as “bch” and “cas” is used, meaning unit calculation count per operation and calculation number per bch, also known as “history”, respectively. As can be inferred from their definition, the product of bch and cas means the total number of the iterative work. If participant processors and product of bch and cas are set, the quota is distributed to each processor. At every end of bch, calculation results from individual processors are automatically summed and moved to the next bch. PHITS code with MPI function proceeds with the calculation in the following manner:

Node 1 reads input file, including bch, cas, and number of processor.
Equally calculated workload is distributed to participant CPU.
Each CPU sends its result to Node 1 through switching hub via MPI protocol.
Node 1 receives those results and combines information at every end of bch.

Procedures b to d are repeated until total work is completed. The computing structure scheme used in the study was configured as shown in Figure 2.

In this study, three comparable computers were designated as calculation node, and the term of “node” was used in consideration of the entry computer’s role. Each node consists of two CPU cores, and this core is able to divide into two processors. So, our cluster is equivalent to twelve processors, connected through a switching hub. This node’s performance is tabulated in Table 1.

Because of iterative calculation for reasonable relative errors, the performance of MCNP parallel computing was affected by differences in CPU performance [13]. So, in this study, CPU performance was similarly limited to eliminate this effect. The calculation-relevant cluster construction work was performed in the following order, and the entire procedure is depicted in Figure 3.

2.2.1. MPICH2 Installation and Registration

In this study, the open-source software MPICH2 (distribution file name: mpich2-1.4.1p1) provided by Argonne National Laboratory was used. After installing MPICH2, the account of each node to be used for calculation was registered in the program, and the path of the MPICH2 program was added to the system environment variable for smooth operation.

2.2.2. Building a Local Network

For communication between nodes, the NEXT-504N from Easynet Ubiquitous was used as a switching hub. For the smooth operation of the programs (Phits_mpi.exe, mpiexec, and smpd) necessary for the PHITS code to send and receive data through the hub, the firewall rules of each node were modified. In addition, a work folder was set up to share the results of the bch, and the same folder structure was applied to reduce the required time and prevent confusion.

2.2.3. PHITS Code Compilation

In order to perform parallel computing between processors, the PHITS code was compiled so that the CPU of each node could recognize each other. The compilation process used the open-source program “MinGW w64”, developed by Tietz et al., with Gfortran and the executable file included in the PHITS code package.

2.3. Verification of Computational Simulation

In this study, beam profile and PDD were measured to evaluate the reliability of the simulation. 6 MV (field size: 4 cm × 4 cm, 10 cm × 10 cm) of the Varian Clinac iX were measured at depths of 15 mm, 50 mm, 100 mm, and 200 mm from the surface of the PTW BeamScan water phantom (PTW, Freiburg, Germany). A semiflex ionization chamber (S/N 1278, PTW31010, Freiburg, Germany) was used as an ionization chamber for measurement, and analysis was performed using a PTW Mephysto mc² (PTW, Freiburg, Germany). Equation (1) and criteria defined in the study by Venselaar et al. were used for verification

δ_{i} = [\frac{(D_{c a l} - D_{m e a s})}{D_{m e a s}}] \times 100, i = 1 \dots 4, 50, 90

(1)

where

D_{c a l}

and

D_{m e a s}

denote simulated values and measured values, respectively. δ₁ is the criterion for evaluating the value of the fall-off area where the PDD decreases after the build-up section in the PDD, and δ₂ is the evaluation criterion for the build-up section, or relatively high-dose and high-gradient part. The analysis of the irradiated range, the high-dose/low-dose gradient, was δ₃. At the same time, the low-dose, low-slope beam profile area outside the irradiation range was evaluated as δ₄. The distance between 50% and 90%, also known as beam fringe, was evaluated as δ_50–90 [19,20].

2.4. MPI Evaluation

Prior to cluster evaluation, individual node performance was evaluated using only three processors each and analyzed for PDD simulation, 3 × 10⁸ times in a 10 cm × 10 cm field size. MPI performance was analyzed by dividing the total number of processors into 3, 6, 9, and 12 categories. Here, it is assumed that a time delay occurs in the process of integrating data performed by each processor. Therefore, in order to evaluate the effect of the process on the total required time, the total number of iterations was maintained, while bch was changed. As for the time required for the operation, the average CPU time provided by the PHITS code at the time the computational simulation was completed, and the speedup factor, calculated by Equation (2), was examined for each bch.

Speedup Factor = [\frac{{Average CPU Time}_{N}}{{Average CPU Time}_{3}}] \times 100

(2)

where

Average CPU {Time}_{3}

and

Average CPU {Time}_{N}

refer to the case where there are 3 and N processors. In addition, there are work times in which parallel processing is impossible, such as data communication and collection, other than individual processors performing tasks. Since the corresponding factor differs depending on the parallel processing system, it is assumed to be K-factor in this study. In order to evaluate this factor, an appropriate value for each bch was calculated using Equation (3) and the calculated speedup factor by the least squares method.

Speedup Factor = [\frac{1}{(1 - K) + (\frac{K}{S})}]

(3)

where

K

and S mean the K-factor of our cluster system and the rate of minimum required processor for the cluster to use processors for each operation; in our case, it would be 1, 2, 3, and 4 in order of 3, 6, 9, and 12 processors, respectively [21].

For the data, the average value of 5 times

Average CPU Time

is required to perform the computational simulation for PDD; 3 × 10⁸ times in a 10 cm × 10 cm field size was used for analysis.

3. Results

3.1. Computational Simulation Evaluation

In order to compare with the measured values, 10⁹ calculations were performed at depths of 15 mm, 50 mm, 100 mm, and 200 mm of 1 m³ water implemented based on PNNL data (relative error ≤ 0.05). In the case of the beam profile at field size 4 cm × 4 cm, the results showed values within the presented reference range except for δ₂. In particular, if we look at each part of the profile, and the result of δ₃ showed a result of minimum 0.2% (200 mm, cross-plane) and of maximum 1.2% (50 mm, in-plane and cross-plane). Other cases were also within the reference range of 2%, but for δ₂, which is value for the build-up section or the relatively high-dose and high-gradient part, it showed a maximum of 34.8% (50 mm, in-plane) and minimum value of 6.1% (200 mm, in-plane). Those values are significantly exceeding the threshold of 3%. The data for relatively low-dose, low-slope area of the beam profile showed a maximum of 22.5% (15 mm, cross-plane) and a minimum of 4.7% (200 mm, in-plane). Evaluated data from the low-dose, low-slope region seem excessive but are still acceptable because this region’s criterion is 30%. Compared to the previous two results, the evaluation value for beam fringe, δ_50–90, was within the reference range in a stable manner (Table 2).

For the other case, of 10 cm × 10 cm field size, the results showed the same tendency, but the details are slightly different. For example, the result of δ₃ showed a result of minimum 0.5% (15 mm, in-plane and 200 mm, cross-plane) and of maximum 0.9% (50 mm, in-plane and cross-plane). Other cases were also within 1%. Those values are much smaller than the reference range of 2%, but δ₂ value showed a maximum of 22.3% (15 mm, in-plane) and a minimum value of 5.3% (200 mm, cross-plane). Those are also exceeding the reference value of 3%. Evaluated values for δ₄ showed a maximum of 30.1% (15 mm, cross-plane) and a minimum of 4.6% (100 mm, cross-plane). In contrast to the in-plane data, the reference value has slightly been exceeded while showing a large value in all depth measurements. Regarding data for the beam fringe, δ_50–90 also showed stable results, from 1 mm to 2 mm of being included in the criterion (Table 3).

Figure 4 shows the beam profile comparison between measured data and simulated data by PHITS. The in-plane profile of both field sizes have been sorted in the order of 15 mm, 50 mm, 100 mm, and 200 mm. A constant magnification was implied for convenience to facilitate checking a lot of data at once. Magnification constant: 1, 0.8, 0.6, 0.4 for 15 mm, 50 mm, 100 mm, and 200 mm, respectively.

Figure 5 shows the cross-plane profile of both calculated and measured data has also been sorted in order of 15 mm, 50 mm, 100 mm, and 200 mm. The same magnification was utilized for 15 mm, 50 mm, 100 mm, and 200 mm.

There are two criterion values for the PDD value, δ₁ and δ₂. Both evaluated data met the reference value. Specifically, evaluation of the fall-off region after the build-up area, δ₁, was 2.0% and 1.1% for the 4 cm × 4 cm and 10 cm × 10 cm field sizes, respectively. The discrepancy of the build-up region was 2 mm and 0 mm for 4 cm × 4 cm and 10 cm × 10 cm field sizes, respectively (Table 4).

In the case of PDD data, no magnification constant has been applied. Discrepancy of the build-up region for measured data was figured out as 2 mm because each of the build-up regions was defined as 14 mm and 16 mm for measured data and simulated data (field size: 4 cm × 4 cm), respectively. An exact match was found when it comes to comparison between the cases of 10 cm × 10 cm field size. Those build-up regions were 14 mm. These discrepancies are acceptable for our purpose, giving consideration for the abovementioned criterion (Figure 6).

3.2. MPI Performance Evaluation

3.2.1. Individual Node and Three Processor Clusters

Simulation results on 30,000 bch were

6.35 \pm 0.09

,

5.97 \pm 0.2

,

6.14 \pm 0.13

for Node 1, Node 2, Node 3, respectively. When this bch was set as “30”, required average CPU times were

6.0 \pm 0.04

,

5.61 \pm 0.03

, 5.79 ± 0.02. From this relationship, it seems that as bch decreases, those required times tend to decrease. Individual node performance was evaluated in the order of Node 2, Node 3, and Node 1 in all cases of bch and showed the maximum and minimum differences at 6.3% for 300 bch and 2.4% for 30,000 bch (Table 5, Figure 7).

On the other hand, the result obtained by a cluster consisting of each node and the same number of processors was

3.63 \pm 0.02

,

3.6 \pm 0.01

,

3.68 \pm 0.02

,

3.72 \pm 0.02

in order of 30 bch, 300 bch, 3000 bch, 30,000 bch, respectively. There is a similar tendency to the case of Table 5, except the result of 30 bch is listed in Table 6 (it showed slightly increased outcome compared with 300 bch). However, those results showed a performance increase of up to 70.6% (30,000 bch, comparison result with Node 1) and at least 54.5% (30 bch, comparison result with Node 2) (Table 6).

3.2.2. Speedup Factor and K-Factor

When the number of processors was increased from 3 to 6, the maximum increase was 64.0% for 30 bch. From 6 to 9 and from 9 to 12, it increased by 17.6% each. For 300 bch, there were increases of 60.0%, 19.3%, and 19.8% as the number of processors increased, and for 3000 bch, the same trend was observed. However, for 30,000 bch, it decreased by 2.5% from 6 to 9. In general, the speedup factor showed a tendency to saturate along with the increased number of processors (Table 7, Figure 8).

K-factor was calculated by the least squares method through the speedup factor and Equation (3) in order to evaluate atypical parts such as data processing and transmission/reception among the total operation time. The derived values were 0.402, 0.313, 0.259, and 0.258 for 30,000 bch, 3000 bch, 300 bch, and 30 bch, respectively. At the same time, there was a tendency to plateau as bch decreased (Table 8).

4. Discussion

The phsp data provided by the IAEA showed results that matched well with actual measured values. However, in the high-dose, high-gradient part, the results were significantly higher than the reference value. Considering that there is no notable difference in the comparison graph of the measured and simulated values according to the location of the ionization chamber, this may be due to the relatively short corresponding part with little comparison data, and minute differences in factors cause rapid changes. This presented result in this study is similar to those by Bednarz et al. [7]. Referring to their research, large differences in out-of-field region (

δ_{2}

,

δ_{4}

in our study) are unavoidable because of lower radiation dose and larger uncertainties in the calculated data. Other results presented by T. Berris et al. showed similar results compared to ours. In their study, there is almost a 60% difference at the out-of-field regions but differences of 2% and 2 mm at the plateau region [6]. In consideration that the abovementioned study was conducted for the modeling of a linear accelerator, our results are acceptable as well. Nevertheless, in order to obtain a reliable dose distribution through phsp data in the future, we suggest that a careful consideration or work to supplement this discrepancy is necessary (Figure 4, Figure 5 and Figure 6).

It was confirmed that there is a difference in simulation performance even if the performance of the node, which could be checked primarily, such as GHz and RAM size, was nearly same [13]. This was one of the factors that delayed the overall required time, as it was necessary to wait for the calculation results of nodes with poor performance in the process of collecting the results after distributing them by bch. So, for more effective cluster operations, it may be necessary to configure it by evaluating the performance of each node in advance.

Additionally, there is another trend among bch that might affect the simulation time founded on our study. This tendency was clearly observed in Figure 4. Every bch is reduced by 1/10, which is equal to “calculation per bch” increased by ten times, and required average CPU time was also decreased. At the same time, results from the speedup factor indicated similar trend. Increments of the speedup factor value are becoming greater when bch decreases. For more detailed analysis, the comparison between 30 bch and 30,000 bch has been notable. There are similar tendencies with results of Table 5 and Table 6. The increments of the speedup factor from 30,000 bch to 30 bch showed the largest increase. (300 bch with 12 processors shows highest result as 2.29, but in consideration of standard deviation, there is some uncertainty). Additionally, this trend has been observed throughout all data. Those trends give an indication that result collection process after each bch is much more influenced by simulation time than was our expectation.

Again, from the results of the speedup factor according to bch, it can be inferred that, as the number of calculations per bch increases, the fraction of time used for data processing and transmission/reception through the network decreases among the total required time. For example, since the number of communications between processors increased for 30,000 bch, atypical factors intervened to increase the K-factor compared to the result of 30 bch. From this point of view, it is possible to explain some results in which no significant reduction was observed in all cases as bch decreased, and it is thought that the heterogeneity of the result value increased. For the same reason, the decrease in the K-factor according to bch converges at a constant value.

However, there are consequences that are not easily understood. In the case of a cluster which consists of three processors, the average CPU time was markedly reduced, even if the same number of processors were used. We only infer that these results occurred from computational reasons, but it is clear that parallel processing through MPI was effective as there was a maximum performance increase of 70.6% when computing in a cluster.

From the results mentioned above, the efficiency of the cluster can be reconsidered by minimizing the intervention of unstructured factors other than the time required for actual operation. Therefore, bch should be minimized and set as much as the number of processors is mobilized. According to the model used in this study, Equation (3) and clusters, the factors affecting the speedup factor are the K-factor and the number of processors, and the MPI performance is expected to converge at a certain value. This shows a remarkable difference between the research results of Alberto Colasanti et al., who performed parallel computing with the existing MCNP code, and the research results of Li Deng et al. [14,16]. In particular, we paid attention to Li Deng et al.’s result that the speedup factor increased linearly with errors of 5% and 13% even when the number of processors increased to 32 and 64. This may have been due to the increase in the K-factor with the use of secondary equipment, such as switching hubs, in the process of grouping physically separated nodes into one cluster and the time delay caused by the difference in computing power between nodes, unlike in previous studies using a processor in a single computer [16].

However, performance is enhanced when configured as a cluster as opposed to a single node. Even if the performance increase converges to a certain value, a significant level of performance improvement is achieved by overlapping the size effect of the cluster and the improvement of computing power from individual nodes.

5. Conclusions

In this study, verification of IAEA phase space data as radiation source was performed. The accuracy of this data as source was verified against ionization chamber measurements and PHITS calculations. We expect that phsp data is able to play the role of convenient accessibility for radiotherapists, medical physicists, or whoever is interested in radiation protection and the dose estimation of radiotherapy. Additionally, the MPI function provided by the PHITS code package was evaluated. While the increase in computational power according to the number of processors was identified, the cluster, composed of separate computers, had clear limitations stemming from differences in performance and parts that could not be processed in parallel, such as data transmission, reception, and collection. Considering that the MCNP method depends on the performance of the CPU used for calculation, it seems possible to overcome these limitations by improving the CPU performance of the nodes constituting the cluster.

In the future, measures to reduce the K-factor should be devised through changes in communication methods between nodes and the efficient operation of data transmission and reception. Furthermore, it is necessary to study cluster efficiency in complex simulations, such as radiation treatment computer simulations, through PHITS code and phsp data.

Author Contributions

Conceptualization, H.-J.G. and K.-B.K.; methodology, H.-J.G. and K.-B.K.; software, H.-J.G.; validation, S.K., S.-B.H. and K.-B.K.; investigation, H.-J.G., S.-B.H., S.K. and K.-B.K.; data curation, H.-J.G. and K.-B.K.; writing—original draft preparation, H.-J.G. and K.-B.K.; writing—review and editing, H.-J.G., S.-B.H., S.K. and K.-B.K.; visualization, H.-J.G.; supervision, K.-B.K.; project administration, K.-B.K. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Nuclear Safety Research Program through the Korea Foundation Of Nuclear Safety (KoFONS) using the financial resource granted by the Nuclear Safety and Security Commission (NSSC) of the Republic of Korea (No. 2202012, No. 2103088).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available upon request from the corresponding author.

Conflicts of Interest

Author Hyeok-Jun Gwon was employed by KIRAMS. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Sohrabpour, M.; Hassanzadeh, M.; Shahriari, M.; Sharifzadeh, M. Gamma irradiator dose mapping simulation using the MCNP code and benchmarking with dosimetry. Appl. Radiat. Isot. 2002, 57, 537–542. [Google Scholar] [CrossRef] [PubMed]
Werner, C.J.; Bull, J.S.; Solomon, C.J.; Brown, F.B.; McKinney, G.W.; Rising, M.E.; Dixon, D.A.; Martz, R.L.; Hughes, H.G.; Cox, L.J.; et al. MCNP Version 6.2 Release Notes; Los Alamos National Lab. (LANL): Los Alamos, NM, USA, 2018.
Battistoni, G.; Boehlen, T.; Cerutti, F.; Chin, P.W.; Esposito, L.S.; Fassò, A.; Ferrari, A.; Lechner, A.; Empl, A.; Mairani, A.; et al. Overview of the FLUKA code. Ann. Nucl. Energy 2015, 82, 10–18. [Google Scholar] [CrossRef] [Green Version]
Guatelli, S.; Cutajar, D.; Oborn, B.; Rosenfeld, A.B. Introduction to the Geant4 Simulation toolkit. In AIP Conference Proceedings; American Institute of Physics: College Park, MD, USA, 2011; Volume 1345, pp. 303–322. [Google Scholar]
Furuta, T.; Sato, T. Medical application of particle and heavy ion transport code system PHITS. Radiol. Phys. Technol. 2021, 14, 215–225. [Google Scholar] [CrossRef] [PubMed]
Berris, T.; Mazonakis, M.; Stratakis, J.; Tzedakis, A.; Fasoulaki, A.; Damilakis, J. Calculation of Organ Doses from Breast Cancer Radiotherapy: A Monte Carlo Study. J. Appl. Clin. Med. Phys. 2013, 14, 133–146. [Google Scholar] [CrossRef] [PubMed]
Bednarz, B.; Xu, X.G. Monte Carlo modeling of a 6 and 18 MV Varian Clinac medical accelerator for in-field and out-of-field dose calculations: Development and validation. Phys. Med. Biol. 2009, 54, N43–N57. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Capote, R.; Jeraj, R.; Ma, C.M.; Rogers, D.W.; Sánchez-Doblado, F.; Sempau, J.; Seuntjens, J.; Siebers, J.V. Phase-Space Database for External Beam Radiotherapy. Summary Report of a Consultants’ Meeting; International Atomic Energy Agency: Vienna, Austria, 2006. [Google Scholar]
Al-Affan, I.A. A comparison of speeds of personal computers using an x-ray scattering Monte Carlo benchmark. Phys. Med. Biol. 1996, 41, 309. [Google Scholar] [CrossRef] [PubMed]
Zhu, C.; Liu, Q. Review of Monte Carlo modeling of light transport in tissues. J. Biomed. Opt. 2013, 18, 050902. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Polo, I.O.; Santos, W.S.; de Lara Antonio, P.; Caldas, L.V. Variance reduction technique in a beta radiation beam using an extrapolation chamber. Appl. Radiat. Isot. 2017, 128, 154–157. [Google Scholar] [CrossRef] [PubMed]
Alerstam, E.; Svensson, T.; Andersson-Engels, S. Parallel computing with graphics processing units for high-speed Monte Carlo simulation of photon migration. J. Biomed. Opt. 2008, 13, 060504. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Sazali, M.A.; Sarkawi, M.S.; Ali, N.S. Multiprocessing implementation for MCNP using Python. In IOP Conference Series: Materials Science and Engineering; IOPscience: Bristol, UK, 2022; Volume 1231, p. 012003. [Google Scholar]
Colasanti, A.; Guida, G.; Kisslinger, A.; Liuzzi, R.; Quarto, M.; Riccio, P.; Roberti, G.; Villani, F. Multiple processor version of a Monte Carlo code for photon transport in turbid media. Comput. Phys. Commun. 2000, 132, 84–93. [Google Scholar] [CrossRef]
Wagner, J.C.; Haghighat, A. Parallel MCNP Monte Carlo transport calculations with MPI. Trans. Am. Nucl. Soc. 1996, 75, CONF-961103. [Google Scholar]
Deng, L.; Xie, Z.S. Parallelization of MCNP Monte Carlo neutron and photon transport code in parallel virtual machine and message passing interface. J. Nucl. Sci. Technol. 1999, 36, 626–629. [Google Scholar] [CrossRef]
Mark, D.U.; Mohd, H.R.; Mohd, A.S.; Mohamad, P.A. Performance of MPI parallel processing implemented by MCNP5/MCNPX for criticality benchmark problems. In Proceedings of the R and D Seminar 2012: Research and Development Seminar. 2012. Available online: https://inis.iaea.org/search/search.aspx?orig_q=RN:44096876 (accessed on 29 January 2023).
McConn, R.J.; Gesh, C.J.; Pagh, R.T.; Rucker, R.A.; Williams, R., III. Compendium of Material Composition Data for Radiation Transport Modeling; Pacific Northwest National Lab. (PNNL): Richland, WA, USA, 2011.
Venselaar, J.; Welleweerd, H.; Mijnheer, B. Tolerances for the accuracy of photon beam dose calculations of treatment planning systems. Radiother. Oncol. 2001, 60, 191–201. [Google Scholar] [CrossRef] [PubMed]
Ekstrand, K.E.; Barnes, W.H. Pitfalls in the use of high energy X rays to treat tumors in the lung. Int. J. Radiat. Oncol. Biol. Phys. 1990, 18, 249–252. [Google Scholar] [CrossRef] [PubMed]
Amdahl, G.M. Validity of the single processor approach to achieving large scale computing capabilities. In Proceedings of the Spring Joint Computer Conference, New York, NY, USA, 18–20 April 1967; Volume 18, pp. 483–485. [Google Scholar]

Figure 1. Water phantom and radiation source setup: water with a volume of 1 m³ was implemented based on Pacific Northwest National Lab material information at an SSD of 100 cm.

Figure 2. Local network configuration: Three physically separated computers were used. Individual nodes have the same performance in terms of RAM and CPU, and data communication is possible through a switching hub.

Figure 3. Schematic diagram of PC cluster construction: individual nodes were integrated into one cluster through the processes of (A) MPICH2 installation and registration, (B) local network construction, and (C) PHITS code compilation.

Figure 4. Superimposed beam profiles of measured and simulated values for Varian Clinac 6MV: the in-plane was evaluated by dividing by field size of 5 mm, 50 mm, 100 mm, and 200 mm depths (4 cm × 4 cm, 10 cm × 10 cm).

Figure 5. Superimposed beam profiles of measured and simulated values for Varian Clinac 6MV: the cross-plane was evaluated by dividing by field size at 15 mm, 50 mm, 100 mm, and 200 mm depth (4 cm × 4 cm, 10 cm × 10 cm).

Figure 6. Superimposed measured and simulated PDD values for Varian Clinac 6MV: It was evaluated by dividing by field size (4 cm × 4 cm, 10 cm × 10 cm).

Figure 7. Individual node performance was evaluated using only three processors each. The average value of five average CPU times required to repeat 3 × 10⁸ computational simulations for the PDD in a 10 cm

\times

10 cm field size was analyzed.

Figure 7. Individual node performance was evaluated using only three processors each. The average value of five average CPU times required to repeat 3 × 10⁸ computational simulations for the PDD in a 10 cm

\times

10 cm field size was analyzed.

Figure 8. Speedup factor increase trend according to bch change: the average value of five average CPU times required to perform 3 × 10⁸ iterations of the computational simulation for the PDD in the 10 cm

\times

10 cm field size was analyzed.

Figure 8. Speedup factor increase trend according to bch change: the average value of five average CPU times required to perform 3 × 10⁸ iterations of the computational simulation for the PDD in the 10 cm

\times

10 cm field size was analyzed.

Table 1. Performance of the PCs that make up the cluster: The RAM, GHz, and number of processors that can affect simulation results were classified. Since computational performance was affected by the difference in CPU performance, the performance of Node 2 was limited in the similar way as that of other CPUs.

Node #	CPU Core	RAM	No. of Processor
Node 1	Intel i3-540 3.07 GHz	4 GB	4
Node 2	Intel i3-3220 3.3 GHz (3.1 GHz)	4 GB	4
Node 3	Intel i3-2100 3.10 GHz	4 GB	4

Table 2. Beam profile analysis of measured and simulated values for Varian Clinac 6MV: Data of 4 cm × 4 cm field size were analyzed at 15 mm, 50 mm, 100 mm, and 200 mm depths. Each part of beam profile was analyzed in order of δ₂, δ₃, δ_50–90, δ₄. All values were within criterion, except for δ_2, which represents high dose and high gradient.

	In-Plane				Cross-Plane				Reference Value
	15 mm	50 mm	100 mm	200 mm	15 mm	50 mm	100 mm	200 mm	Reference Value
$δ_{2}$	22.3%	34.8%	7.3%	6.1%	10.4%	17.4%	28.9%	26.5%	3%
$δ_{3}$	0.5%	1.2%	0.3%	0.5%	0.4%	1.2%	0.7%	0.2%	2%
$δ_{50 - 90}$	2 mm	2 mm	1 mm	1 mm	2 mm	1 mm	1 mm	1 mm	2 mm
$δ_{4}$	15.1%	23.3%	7.3%	4.7%	22.5%	13.1%	10.3%	11.2%	30%

Table 3. Beam profile analysis of measured and simulated values for Varian Clinac 6MV: Data of 10 cm × 10 cm field size were analyzed at 15 mm, 50 mm, 100 mm, and 200 mm depths. Each part of beam profile was analyzed in order of δ₂, δ₃, δ_50–90, δ₄. In this case, result of δ₄ slightly exceeded reference value of 30%. δ₂ also showed the result of exceeding the criterion.

	In-Plane				Cross-Plane				Reference Value
	15 mm	50 mm	100 mm	200 mm	15 mm	50 mm	100 mm	200 mm	Reference Value
$δ_{2}$	22.3%	19.4%	5.7%	13.9%	6.0%	12.5%	11.2%	5.3%	3%
$δ_{3}$	0.5%	0.9%	0.5%	0.5%	0.6%	0.9%	0.8%	0.5%	2%
$δ_{50 - 90}$	2 mm	1 mm	2 mm	1 mm	2 mm	1 mm	2 mm	1 mm	2 mm
$δ_{4}$	15.1%	18.8%	5.5%	3.4%	30.1%	22.2%	4.6%	15.7%	30%

Table 4. PDD evaluation of measured and simulated values for Varian Clinac 6MV: It was evaluated by dividing by field size (4 cm × 4 cm, 10 cm × 10 cm). For δ_1, which represents fall-off region after build-up area, both field sizes were within the criterion. Additionally, discrepancy of build-up region,

δ_{2}

, was 2 mm and 0 mm, respectively.

Table 4. PDD evaluation of measured and simulated values for Varian Clinac 6MV: It was evaluated by dividing by field size (4 cm × 4 cm, 10 cm × 10 cm). For δ_1, which represents fall-off region after build-up area, both field sizes were within the criterion. Additionally, discrepancy of build-up region,

δ_{2}

, was 2 mm and 0 mm, respectively.

	4 cm × 4 cm	10 cm × 10 cm	Reference Value
$δ_{1}$	2.0%	1.1%	2%
$δ_{2}$	2 mm	0 mm	2 mm

Table 5. Prior to cluster evaluation, calculations were performed with three processors each to compare individual node performance. Node CPU time (h) is the average of the results of repeating the same operation five times. Avg is the average value at this time, and SD is the standard deviation of the measured value.

No. of bch	Node 1		Node 2		Node 3
No. of bch	Avg	S.D	Avg	S.D	Avg	S.D
30 bch	6.0	0.04	5.61	0.03	5.79	0.02
300 bch	6.01	0.03	5.63	0.05	5.81	0.03
3000 bch	6.17	0.18	5.93	0.04	6.08	0.02
30,000 bch	6.35	0.09	5.97	0.2	6.14	0.13

Table 6. CPU time (h) of a cluster consisting of three processors. In order to exclude the effect due to the difference in node performance, one processor was mobilized from each node. The results obtained by repeating the same simulation five times, as in the individual node evaluation, were averaged. Avg means the average value at this time, and SD means the standard deviation of the measured values.

No. of Processor	30 bch		300 bch		3000 bch		30,000 bch
3	Avg	S.D	Avg	S.D	Avg	S.D	Avg	S.D
3	3.63	0.02	3.6	0.01	3.68	0.02	3.72	0.02

Table 7. Speedup factor. Based on three processors, as the number of processors increased to 6, 9, and 12, the comparison was made. In order to exclude the effect of the difference in performance between nodes, the entire processor was evaluated by adding one from each node. The results of five iterations were averaged. Avg means the average value at this time, and SD means the standard deviation of the measured values.

No. of Processor	30 bch		300 bch		3000 bch		30,000 bch
No. of Processor	Avg	S.D	Avg	S.D	Avg	S.D	Avg	S.D
3	1	0.01	1	0.01	1	0.01	1	0.01
6	1.64	0.01	1.6	0.01	1.59	0.01	1.55	0.02
9	1.93	0.02	1.91	0.01	1.8	0.02	1.51	0.01
12	2.27	0.07	2.29	0.02	2.07	0.07	1.81	0.05

Table 8. K-factor. Because it is a part where parallel processing such as data communication and collection is impossible in the configured cluster, it has been calculated using the least squares method.

	30 bch	300 bch	3000 bch	30,000 bch
K-Factor	0.258	0.259	0.313	0.402

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Gwon, H.-J.; Hwang, S.-B.; Kim, S.; Kim, K.-B. Evaluation of Parallel Computing on MPI Version PHITS Code. Appl. Sci. 2023, 13, 3782. https://doi.org/10.3390/app13063782

AMA Style

Gwon H-J, Hwang S-B, Kim S, Kim K-B. Evaluation of Parallel Computing on MPI Version PHITS Code. Applied Sciences. 2023; 13(6):3782. https://doi.org/10.3390/app13063782

Chicago/Turabian Style

Gwon, Hyeok-Jun, Sun-Boong Hwang, Sangrok Kim, and Kum-Bae Kim. 2023. "Evaluation of Parallel Computing on MPI Version PHITS Code" Applied Sciences 13, no. 6: 3782. https://doi.org/10.3390/app13063782

APA Style

Gwon, H.-J., Hwang, S.-B., Kim, S., & Kim, K.-B. (2023). Evaluation of Parallel Computing on MPI Version PHITS Code. Applied Sciences, 13(6), 3782. https://doi.org/10.3390/app13063782

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Evaluation of Parallel Computing on MPI Version PHITS Code

Abstract

1. Introduction

2. Materials and Methods

2.1. Radiation Source and Water Phantom Setup

2.2. MPI Activation

2.2.1. MPICH2 Installation and Registration

2.2.2. Building a Local Network

2.2.3. PHITS Code Compilation

2.3. Verification of Computational Simulation

2.4. MPI Evaluation

3. Results

3.1. Computational Simulation Evaluation

3.2. MPI Performance Evaluation

3.2.1. Individual Node and Three Processor Clusters

3.2.2. Speedup Factor and K-Factor

4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI