Reducing Computational Time in Pixel-Based Path Planning for GMA-DED by Using Multi-Armed Bandit Reinforcement Learning Algorithm
Abstract
:1. Introduction
2. AI Reinforcement Learning Background Applied to Path Planning Strategies in GMA-DED
3. Proposal of One Trajectory Planning Strategy Assisted by AI Reinforcement Learning Using a MAB: The Case of the Advanced-Pixel
3.1. The Enhanced-Pixel Strategy
Algorithm 1 Enhanced-Pixel Strategy |
1: INPUT: Number of loops t 2: OUTPUT: Best Values matrix 3: Initialize Matrix1[10][ ] ← empty 4: Initialize BestValues[ ] ← empty 5: Order nodes in X Direction (AOx) 6: Order nodes in Y Direction (AOy) 7: for i = 1 to t step 1 do 8: Choose a random start node ▷ Generate trajectories along AOx direction 9: for each heuristic h ∈ {NNH, AH, BH, RCH, CH} do 10: Generate trajectory with h starting from random node 11: Apply 2-opt optimisation to the trajectory 12: Calculate distance d 13: Store d in Matrix1 14: end for ▷ Generate trajectories along AOy direction 15: for each heuristic h ∈ {NNH, AH, BH, RCH, CH} do 16: Generate trajectory with h starting from random node 17: Apply 2-opt optimisation to the trajectory 18: Calculate distance d 19: Store d in Matrix1 20: end for 21: Extract the minimum distance dmin from Matrix1 22: Store dmin in BestValues matrix 23: end for 24: return BestValues |
3.2. The Advanced-Pixel Strategy
Algorithm 2 Advanced-Pixel Strategy—MAB Based Trajectory Planning |
1: INPUT: Number of iterations n 2: INPUT: Hyperparameters of the MAB policy tool (ϵ-greedy, UCB, TS) 3: OUTPUT: Best Value Matrix 4: Define the set of AO-HTP combinations: H = {h1, h2, h3, h4, h5, h6, h7, h8, h9, h10} 5: Initialize Q(h) = 0 for all h ∈ H ▷ Value function for each AO-HTP combination 6: Initialize N(h) = 0 for all h ∈ H ▷ Counter of selections for each h 7: Initialize BestValueMatrix ← empty 8: for t = 1 to n do 9: Select a random starting node ▷ Choose an AO-HTP combination according to the MAB policy tool 10: Choose h ∈ H based on the policy (e.g., ϵ-greedy, UCB, TS) 11: Generate trajectory using combination h 12: Calculate trajectory distance Dt(h) ▷ Update statistics for h + 1 ▷ Store trajectory distance in Best Value Matrix 15: Store Dt(h) in BestValueMatrix 16: end for 17: Return BestValueMatrix |
3.2.1. ε-Greedy Policy Tool
Algorithm 3 ϵ-greedy Policy for Selecting AO-HTP Combinations |
1: INPUT: Set of AO-HTP combinations H = {h1, h2, h3, h4, h5, h6, h7, h8, h9, h10} 2: INPUT: Value function Qt(h) for all h ∈ H 3: INPUT: Exploration probability ϵ (0 ≤ ϵ ≤ 1) 4: function EPSILONGREEDYSELECTION(H, Q, ϵ) 5: Generate a random number r in [0, 1] 6: if r < ϵ then ▷ Exploration: choose randomly 7: Randomly select h from H 8: else ▷ Exploitation: choose the best option 10: end if 11: return h 12: end function |
3.2.2. Upper Confidence Bound (UCB) Policy Tool
Algorithm 4 UCB Policy for Selecting AO-HTP Combinations |
1: INPUT: Set of AO-HTP combinations H = {h1, h2, h3, h4, h5, h6, h7, h8, h9, h10} 2: INPUT: Value function Qt(h) for all h ∈ H 3: INPUT: Number of iterations t 4: INPUT: Exploration hyperparameter c (positive value) 5: INPUT: Nt(h), number of times each combination has been selected 6: function UCBSELECTION(H, Q, t, c, Nt) 7: for each h ∈ H do 8: if Nt(h) = 0 then 9: Generate trajectory using combination h 10: Calculate trajectory distance Dt(h) 11: ← + 1 12: 13: else 14: 15: end for 16: Select 17: return h 18: end function |
3.2.3. Thompson Sampling (TS) Policy Tool
Algorithm 5 Thompson Sampling for AO-HTP Combinations Selection |
1: INPUT: Set of AO-HTP combinations H = {h1, h2, h3, h4, h5, h6, h7, h8, h9, h10} 2: INPUT: Alpha α = 1 and Beta β = 1 for each h ∈ H 3: INPUT: Number of iterations t 4: INPUT: Distance of trajectory for each h ∈ H (from previous iterations) 5: function THOMPSONSAMPLING(H, α, β, t) 6: if Dt(h) < and t ≠ 1 then 7: Reward is a sucess 8: else 9: Reward is a failure 10: end if 11: if Reward is a success then 12: α(h) ← α(h) + 1 ▷ Increment α for successful h 13: else 14: β(h) ← β(h) + 1 ▷ Increment β for unsuccessful h 15: end if 16: For each h ∈ H, sample a value Qt(h) ∼ Beta(α(h), β(h)) 17: Select the 18: return h 19: end function |
4. Computational Validation of the Advanced-Pixel Strategy Assisted by an AI Reinforcement Learning Approach
4.1. Methodology to Assess Computational Efficiency Increase with AI Reinforcement Learning
4.2. Results and Discussions
5. Experimental Validation of the Advanced-Pixel Strategy Assisted by an AI Reinforcement Learning Approach
5.1. Methodology to Assess Experimentally the Efficiency and Effectiveness Increase with AI Reinforcement Learning
5.2. Results and Discussions
6. A Case Study of Using the Advanced-Pixel Strategy Assisted by an AI Reinforcement Learning Approach
7. Conclusions and Future Work
- (a)
- The algorithm of the Advanced-Pixel strategy processes the optimised solution faster than its predecessor (Enhanced-Pixel) due to fewer iterations.
- (b)
- Reducing iterations does not negatively impact the trajectory planning performance using the Reinforcement Learning approach. In fact, the algorithm performance gain shows that Advanced-Pixel converges, in most cases, to the shortest trajectory with shorter printing times. However, it is worth noting that the solution applied in Advanced-Pixel is based on probabilistic concepts, and one cannot expect the advanced version to beat the predecessor Pixel version in 100% of the cases.
- (c)
- The sensibility of the algorithm performance comparison increases for larger printable parts (higher number of nodes).
- (d)
- Therefore, the implementation of Reinforcement Learning through the MAB problem succeeded well in “grading up” the Pixel family of space-filling trajectory planners.
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Essop, A. 3D Printing Industry. Oil and Gas Industry Consortium Completes Two Projects to Accelerate Adoption of AM. 3D Printing Industry. 2020. Available online: https://3dprintingindustry.com/news/oil-and-gas-industry-consortium-completes-two-projects-to-accelerate-adoption-of-am-169056/ (accessed on 14 May 2021).
- Singh, S.; Sharma, S.K.; Rathod, D.W. A review on process planning strategies and challenges of WAAM. Mater. Today Proc. 2021, 47, 6564–6675. [Google Scholar] [CrossRef]
- Amal, M.S.; Justus, P.C.T.; Senthilkumar, V. Simulation of wire arc additive manufacturing to find out the optimal path planning strategy. Mater. Today Proc. 2022, 66, 2405–2410. [Google Scholar] [CrossRef]
- Ding, D.; Pan, Z.; Cuiuri, D.; Li, H. A multi-bead overlapping model for robotic wire and arc additive manufacturing (WAAM). Robot. Comput. Integr. Manuf. 2015, 31, 101–110. [Google Scholar] [CrossRef]
- Hu, Z.; Qin, X.; Li, Y.; Yuan, J.; Wu, Q. Multi-bead overlapping model with varying cross-section profile for robotic GMAW-based additive manufacturing. J. Intell. Manuf. 2020, 31, 1133–1147. [Google Scholar] [CrossRef]
- Jafari, D.; Vaneker, T.H.J.; Gibson, I. Wire and arc additive manufacturing: Opportunities and challenges to control the quality and accuracy of manufactured parts. Mater. Des. 2021, 2021, 109471. [Google Scholar] [CrossRef]
- Ding, D.; Pan, Z.; Cuiuri, D.; Li, H. A practical path planning methodology for wire and arc additive manufacturing of thin-walled structures. Robot. Comput. Integr. Manuf. 2015, 34, 8–19. [Google Scholar] [CrossRef]
- Ding, D.; Pan, Z.; Cuiuri, D.; Li, H.; Larkin, N. Adaptive path planning for wire-feed additive manufacturing using medial axis transformation. J. Clean. Prod. 2016, 133, 942–952. [Google Scholar] [CrossRef]
- Wang, X.; Wang, A.; Li, Y. A sequential path-planning methodology for wire and arc additive manufacturing based on a water-pouring rule. Int. J. Adv. Manuf. Technol. 2019, 103, 3813–3830. [Google Scholar] [CrossRef]
- Cox, J.J.; Takezaki, Y.; Ferguson, H.R.P.; Kohkonen, K.E.; Mulkay, E.L. Space-filling curves in tool-path applications. Comput.-Aided Des. 1994, 26, 215–224. [Google Scholar] [CrossRef]
- Vishwanath, N.; Suryakumar, S. Use of fractal curves for reducing spatial thermal gradients and distortion control. J. Manuf. Process. 2022, 81, 594–604. [Google Scholar] [CrossRef]
- Singh, S.; Singh, A.; Kapil, S.; Das, M. Utilisation of a TSP solver for generating non-retractable, direction favouring toolpath for additive manufacturing. Addit. Manuf. 2022, 59, 103126. [Google Scholar] [CrossRef]
- Ferreira, R.P.; Scotti, A. The Concept of a Novel Path Planning Strategy for Wire + Arc Additive Manufacturing of Bulky Parts: Pixel. Metals 2021, 11, 498. [Google Scholar] [CrossRef]
- Ferreira, R.P.; Vilarinho, L.O.; Scotti, A. Enhanced-pixel strategy for wire arc additive manufacturing trajectory planning: Operational efficiency and effectiveness analyses. Rapid Prototyp. J. 2024, 30, 1–15. [Google Scholar] [CrossRef]
- Ferreira, R.P.; Schubert, E.; Scotti, A. Exploring Multi-Armed Bandit (MAB) as an AI Tool for Optimising GMA-WAAM Path Planning. J. Manuf. Mater. Process. 2024, 8, 99. [Google Scholar] [CrossRef]
- Kumar, S.; Gopi, T.; Harikeerthana, N.; Gupta, M.K.; Gaur, V.; Krolczyk, G.M.; Wu, C. Machine learning techniques in additive manufacturing: A state of the art review on design, processes and production control. J. Intell. Manuf. 2023, 34, 21–55. [Google Scholar] [CrossRef]
- Wang, Y.; Xu, X.; Zhao, Z.; Deng, W.; Han, J.; Bai, L.; Liang, X.; Yao, J. Coordinated monitoring and control method of deposited layer width and reinforcement in WAAM process. J. Manuf. Process. 2021, 71, 306–316. [Google Scholar] [CrossRef]
- Mattera, G.; Caggiano, A.; Nele, L. Optimal data-driven control of manufacturing processes using reinforcement learning: An application to wire arc additive manufacturing. J. Intell. Manuf. 2025, 36, 1291–1310. [Google Scholar] [CrossRef]
- Petrik, J.; Bambach, M. RLTube: Reinforcement learning based deposition path planner for thin-walled bent tubes with optionally varying diameter manufactured by wire-arc additive manufacturing. Manuf. Lett. 2024, 40, 31–36. [Google Scholar] [CrossRef]
- Petrik, J.; Bambach, M. Reinforcement learning and optimisation based path planning for thin-walled structures in wire arc additive manufacturing. J. Manuf. Process. 2023, 93, 75–89. [Google Scholar] [CrossRef]
- Singh, V.; Chen, S.-S.; Singhania, M.; Nanavati, B.; Kar, A.K.; Gupta, A. How are reinforcement learning and deep learning algorithms used for big data based decision making in financial industries–A review and research agenda. Int. J. Inf. Manag. Data Insights 2022, 2, 100094. [Google Scholar] [CrossRef]
- Hutsebaut-Buysse, M.; Mets, K.; Latré, S. Hierarchical Reinforcement Learning: A Survey and Open Research Challenges. Mach. Learn. Knowl. Extr. 2022, 4, 172–221. [Google Scholar] [CrossRef]
- Bouneffouf, D.; Rish, I.; Aggarwal, C. Survey on Applications of Multi-Armed and Contextual Bandits. In Proceedings of the 2020 IEEE Congress on Evolutionary Computation (CEC), Glasgow, UK, 24 July 2020; pp. 1–8. [Google Scholar] [CrossRef]
- Silva, N.; Werneck, H.; Silva, T.; Pereira, A.C.M.; Rocha, L. Multi-Armed Bandits in Recommendation Systems: A survey of the state-of-the-art and future directions. Expert. Syst. Appl. 2022, 197, 116669. [Google Scholar] [CrossRef]
- Sutton, R.S.; Barto, A.G. Reinforcement Learning: An Introduction; MIT Press: Cambridge, UK, 2018. [Google Scholar]
- Almasri, M.; Mansour, A.; Moy, C.; Assoum, A.; Le Jeune, D.; Osswald, C. Distributed competitive decision making using multi-armed bandit algorithms. Wirel. Pers. Commun. 2021, 118, 1165–1188. [Google Scholar] [CrossRef]
- Russo, D.J.; Roy, B.V.; Kazerouni, A.; Osband, I.; Wen, Z. A Tutorial on Thompson Sampling. Found. Trends Mach. Learn. 2018, 11, 1–96. Available online: https://web.stanford.edu/~bvr/pubs/TS_Tutorial.pdf. (accessed on 30 May 2023). [CrossRef]
- Martín, M.; Jiménez-Martín, A.; Mateos, A.; Hernández, J.Z. Improving A/B Testing on the Basis of Possibilistic Reward Methods: A Numerical Analysis. Symmetry 2021, 13, 2175. [Google Scholar] [CrossRef]
- Jain, S.; Bhat, S.; Ghalme, G.; Padmanabhan, D.; Narahari, Y. Mechanisms with learning for stochastic multi-armed bandit problems. Indian J. Pure. Appl. Math. 2016, 47, 229–272. [Google Scholar] [CrossRef]
- Gupta, A.K.; Nadarajah, S. Handbook of Beta Distribution and Its Applications; CRC Press: Boca Raton, FL, USA, 2004. [Google Scholar]
- Dirkx, R.; Dimitrakopoulos, R. Optimising infill drilling decisions using multi-armed bandits: Application in a long-term, multi-element stockpile. Math. Geosci. 2018, 50, 35–52. [Google Scholar] [CrossRef]
- Mignon, A.; Rocha, A.; Luis, R. An Adaptive Implementation of ε-Greedy in Reinforcement Learning. Procedia Comput. Sci. 2017, 109, 1146–1151. [Google Scholar] [CrossRef]
- Koch, P.H.; Rosenkranz, J. Sequential decision-making in mining and processing based on geometallurgical inputs. Miner. Eng. 2020, 149, 106262. [Google Scholar] [CrossRef]
- Marković, D.; Stojić, H.; Schwöbel, S.; Kiebel, S.J. An empirical evaluation of active inference in Multi-Armed Bandits. Neural Netw. 2021, 144, 229–246. [Google Scholar] [CrossRef]
- Li, Y.; Han, Q.; Zhang, G.; Horváth, I. A layers-overlapping strategy for robotic wire and arc additive manufacturing of multi-layer multi-bead components with homogeneous layers. Int. J. Adv. Manuf. Technol. 2018, 96, 3331–3344. [Google Scholar] [CrossRef]
- Cui, J.; Yuan, L.; Commins, P.; He, F.; Wang, J.; Pan, Z. WAAM process for metal block structure parts based on mixed heat input. Int. J. Adv. Manuf. Technol. 2021, 113, 503–521. [Google Scholar] [CrossRef]
- Tao, W.; Leu, M.C. Design of lattice structure for additive manufacturing. In Proceedings of the International Symposium on Flexible Automation (ISFA), Cleveland, OH, USA, 1–8 August 2016; pp. 325–332. [Google Scholar] [CrossRef]
Reinforcement Learning Terms | Definition |
---|---|
Environment | The place where the agent gathers information, interacts with its surroundings and acquires knowledge through learning processes. |
Agent | Who or what takes actions that affect the environment. |
Action | The set of all possible operations/moves the agent can make. |
Episode | A set of interactions between the agent and the environment during a single run of the algorithm. |
Reward and Regret | A feedback signal provided by the environment to the learning agent (it can be positive rewards, or simply rewards, or negative rewards, or simply regrets). |
Exploration | To gather information to understand the environment better. |
Exploitation | The action of using and benefiting from resources, i.e., using the information from exploration to reach the target results. |
Policy | Set of rules that an agent at a given state must follow to select an action to maximise the reward and avoid regrets. |
Value function | The metric used to estimate the expected return or cumulative reward an agent can obtain in a given state, ruled by the policy. |
Strategy-Tool | “Part 1” | “Part 2” | “Part 3” | |||||||
---|---|---|---|---|---|---|---|---|---|---|
Trajectory Distance (mm) After 500 Iterations (*) | Time (Iterations) to Converge | Regret Analysis | Trajectory Distance (mm) After 500 Iterations (*) | Time (Iterations) to Converge | Regret Analysis | Trajectory Distance (mm) After 500 Iterations (*) | Time (Iterations) to Converge | Regret Analysis | ||
Enhanced-Pixel | 845.4 | 450 | 8364.8 | 1461.9 | 277 | 17,505.6 | 1904.7 | 75 | 16,374.5 | |
Advanced-Pixel | ε-greedy, ε = 0.3 | 844.3 | 243 | 5102.61 | 1461.9 | 94 | 11,547.3 | 1904.0 | 491 | 10,533.7 |
ε-greedy, ε = 0.5 | 845.9 | 223 | 5661.1 | 1461.9 | 69 | 14,459.4 | 1900.0 | 425 | 12,565.9 | |
ε-greedy, ε = 1 with 1% decay | 844.3 | 355 | 4899.8 | 1465.4 | 154 | 9884.5 | 1901.2 | 353 | 9190.8 | |
UCB, c = 0.3 | 845.9 | 23 | 3956.3 | 1461.9 | 53 | 9904.3 | 1901.1 | 36 | 6737.7 | |
UCB, c = 0.5 | 844.3 | 111 | 4343.7 | 1475.5 | 378 | 4005.3 | 1901.1 | 22 | 10,911.5 | |
UCB, c = 3 | 844.3 | 341 | 3815.3 | 1461.9 | 31 | 9518.7 | 1901.1 | 150 | 7878.9 | |
UCB, c = 5 | 844.3 | 113 | 4383.3 | 1475.5 | 63 | 4407.2 | 1901.1 | 421 | 8762.6 | |
TS | 845.9 | 31 | 4229.4 | 1461.9 | 132 | 10,643 | 1901.1 | 213 | 8797.4 |
Process | Kinects (a Cold Metal Transfer technology from Abicor Binzel) |
Arc welding equipament | iRob 501 Pro (Abicor-Binzel) |
Torch movement system | ABB Robot IRB 1520 ID |
Substrate | SAE 1020 carbon steel (200 × 200 × 12 mm) |
Substrate cooling | Natural air cooling |
Wire | AWS ER70S-6—ϕ 1.2 mm |
Shielding gas | Ar + 2% CO2—15 L/min |
CTWD * | 12 mm |
Deposition speed (travel speed) | 48.0 cm/min |
Set wire feed speed | 3.7 m/min |
Set voltage | 15.2 V |
Set current | 136 A |
Interlayer temperature | >80 °C (around the whole top surface area) |
Part Number | Layers | Criteria | Enhanced-Pixel | Advanced-Pixel |
---|---|---|---|---|
1 | Odd | Trajectory Distance (mm) | 845.81 | 845.56 |
Printing Time (s) | 125.07 | 124.36 | ||
Even | Trajectory Distance (mm) | 846.45 | 845.56 | |
Printing Time (s) | 125.11 | 124.35 | ||
2 | Odd | Trajectory Distance (mm) | 1462.67 | 1462.15 |
Printing Time (s) | 215.19 | 215.19 | ||
Even | Trajectory Distance (mm) | 1463.58 | 1462.33 | |
Printing Time (s) | 215.19 | 214.82 | ||
3 | Odd | Trajectory Distance (mm) | 1903.96 | 1903.39 |
Printing Time (s) | 289.01 | 284.14 | ||
Even | Trajectory Distance (mm) | 1902.91 | 1902.33 | |
Printing Time (s) | 288.49 | 284.10 |
Part | Layers | Criteria | Enhanced-Pixel | Advanced-Pixel |
---|---|---|---|---|
Angled grid (Figure 12) | Odd | Trajectory distance (mm) | 9525.08 | 9420.39 |
Printing time (s) | 1609 | 1523 | ||
Even | Trajectory distance (mm) | 9546.08 | 9422.64 | |
Printing time (s) | 1620 | 1556 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Ferreira, R.P.; Schubert, E.; Scotti, A. Reducing Computational Time in Pixel-Based Path Planning for GMA-DED by Using Multi-Armed Bandit Reinforcement Learning Algorithm. J. Manuf. Mater. Process. 2025, 9, 107. https://doi.org/10.3390/jmmp9040107
Ferreira RP, Schubert E, Scotti A. Reducing Computational Time in Pixel-Based Path Planning for GMA-DED by Using Multi-Armed Bandit Reinforcement Learning Algorithm. Journal of Manufacturing and Materials Processing. 2025; 9(4):107. https://doi.org/10.3390/jmmp9040107
Chicago/Turabian StyleFerreira, Rafael P., Emil Schubert, and Américo Scotti. 2025. "Reducing Computational Time in Pixel-Based Path Planning for GMA-DED by Using Multi-Armed Bandit Reinforcement Learning Algorithm" Journal of Manufacturing and Materials Processing 9, no. 4: 107. https://doi.org/10.3390/jmmp9040107
APA StyleFerreira, R. P., Schubert, E., & Scotti, A. (2025). Reducing Computational Time in Pixel-Based Path Planning for GMA-DED by Using Multi-Armed Bandit Reinforcement Learning Algorithm. Journal of Manufacturing and Materials Processing, 9(4), 107. https://doi.org/10.3390/jmmp9040107