Soft Actor-Critic Reinforcement Learning Improves Distillation Column Internals Design Optimization
Abstract
:1. Introduction
1.1. The Challenge: Distillation Column Internals Design
1.2. The Solution: SAC Algorithm-Based Design Optimization
2. Methodology
2.1. Environment: Multistage Distillation Column as RadFrac in AspenPlus®
2.2. Distillation Column Flooding
2.3. Notation
2.4. SAC RL
2.5. Implementation: OpenAI Gymnasium, PyTorch (Version 2.6.0+cu118)
2.5.1. Action Variables
2.5.2. State Variables
2.5.3. Reward Scheme
2.5.4. Hardware Setup
2.5.5. Code and Documentation of the Work Performed
2.6. SAC RL Runs
2.7. Data Collection and Analysis
3. Results and Discussion
3.1. Feasibility of Using SAC RL for Distillation Column Internals Design
3.2. Effect of Reward Scheme on the Performance of SAC
3.3. Effect of Column Diameter on the Performance of SAC
3.4. Effect of SAC Hyperparameters on Performance
3.5. Comparison of SAC RL Algorithm with Established Optimization Algorithms
4. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Seidel, T.; Biegler, L.T. Distillation column optimization: A formal method using stage-to stage computations and distributed streams. Chem. Eng. Sci. 2025, 302, 120875. [Google Scholar] [CrossRef]
- Seader, J.D.; Henley, E.J.; Roper, D.K. Separation Process Principles: With Applications Using Process Simulators, 4th ed.; Wiley: New York, NY, USA, 2016. [Google Scholar]
- Al-Malah, K.I.M. Aspen Plus: Chemical Engineering Applications, 2nd ed.; Wiley: New York, NY, USA, 2022. [Google Scholar]
- Haydary, J. Chemical Process Design and Simulation: Aspen Plus and Aspen Hysys Applications; Wiley: New York, NY, USA, 2019. [Google Scholar]
- AspenTech. AspenPlus. Available online: https://www.aspentech.com/en (accessed on 19 January 2025).
- Bao, J.; Gao, B.; Wu, X.; Yoshimoto, M.; Nakao, K. Simulation of industrial catalytic-distillation process for production of methyl tert-butyl ether by developing user’s model on Aspen plus platform. Chem. Eng. J. 2002, 90, 253–266. [Google Scholar] [CrossRef]
- Kamkeng, A.D.N.; Wang, M. Technical analysis of the modified Fischer-Tropsch synthesis process for direct CO2 conversion into gasoline fuel: Performance improvement via ex-situ water removal. Chem. Eng. J. 2023, 462, 142048. [Google Scholar] [CrossRef]
- Syauqi, A.; Kim, H.; Lim, H. Optimizing olefin purification: An artificial intelligence-based process-conscious PI controller tuning for double dividing wall column distillation. Chem. Eng. J. 2024, 500, 156645. [Google Scholar] [CrossRef]
- Byun, M.; Lee, H.; Choe, C.; Cheon, S.; Lim, H. Machine learning based predictive model for methanol steam reforming with technical, environmental, and economic perspectives. Chem. Eng. J. 2021, 426, 131639. [Google Scholar] [CrossRef]
- Schefflan, R. Teach Yourself the Basics of Aspen Plus, 2nd ed.; Wiley: New York, NY, USA, 2016. [Google Scholar]
- Agarwal, R.K.; Shao, Y. Process Simulations and Techno-Economic Analysis with Aspen Plus. In Modeling and Simulation of Fluidized Bed Reactors for Chemical Looping Combustion; Agarwal, R.K., Shao, Y., Eds.; Springer International Publishing: Cham, Switzerland, 2024; pp. 17–73. [Google Scholar] [CrossRef]
- Chen, Q. The Application of Process Simulation Software of Aspen Plus Chemical Engineering in the Design of Distillation Column. In Proceedings of the Cyber Security Intelligence and Analytics, Haikou, China, 28–29 February 2020; pp. 618–622. [Google Scholar]
- Haarnoja, T.; Zhou, A.; Abbeel, P.; Levine, S. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor. arXiv 2018, arXiv:1801.01290. [Google Scholar] [CrossRef]
- Haarnoja, T.; Zhou, A.; Hartikainen, K.; Tucker, G.; Ha, S.; Tan, J.; Kumar, V.; Zhu, H.; Gupta, A.; Abbeel, P.; et al. Soft Actor-Critic Algorithms and Applications. arXiv 2018, arXiv:1812.05905. [Google Scholar]
- Wankat, P. Separation Process Engineering—Includes Mass Transfer Analysis, 3rd ed.; Prentice Hall: New York, NY, USA, 2012. [Google Scholar]
- Kister, H. Distillation Design; McGraw-Hill: Boston, MA, USA, 1992. [Google Scholar]
- Taqvi, S.A.; Tufa, L.D.; Muhadizir, S. Optimization and Dynamics of Distillation Column Using Aspen Plus®. Procedia Eng. 2016, 148, 978–984. [Google Scholar] [CrossRef]
- Tapia, J.F.D. Chapter 16—Basics of process simulation with Aspen Plus*. In Chemical Engineering Process Simulation, 2nd ed.; Foo, D.C.Y., Ed.; Elsevier: Amsterdam, The Netherlands, 2023; pp. 343–360. [Google Scholar] [CrossRef]
- Hammond, M. pywin32. Available online: https://pypi.org/project/pywin32/ (accessed on 9 January 2025).
- OpenAI. Soft Actor-Critic. Available online: https://spinningup.openai.com/en/latest/algorithms/sac.html (accessed on 9 January 2025).
- McCabe, W.L.; Thiele, E.W. Graphical Design of Fractionating Columns. Ind. Eng. Chem. 1925, 17, 605–611. [Google Scholar] [CrossRef]
- Jones, E.; Mellborn, M. Fractionating column economics. In Chemical Engineering Progress (CEP); AIChE: New York, NY, USA, 1982; pp. 52–55. [Google Scholar]
- Liu, J.; Guo, Q.; Zhang, J.; Diao, R.; Xu, G. Perspectives on Soft Actor–Critic (SAC)-Aided Operational Control Strategies for Modern Power Systems with Growing Stochastics and Dynamics. Appl. Sci. 2025, 15, 900. [Google Scholar] [CrossRef]
- Lillicrap, T.P.; Hunt, J.J.; Pritzel, A.; Heess, N.; Erez, T.; Tassa, Y.; Silver, D.; Wierstra, D. Continuous control with deep reinforcement learning. arXiv 2015, arXiv:1509.02971. [Google Scholar] [CrossRef]
- Towers, M.; Kwiatkowski, A.; Terry, J.; Balis, J.U.; De Cola, G.; Deleu, T.; Goulão, M.; Kallinteris, A.; Krimmel, M.; Kg, A.; et al. Gymnasium: A Standard Interface for Reinforcement Learning Environments. arXiv 2024, arXiv:2407.17032. [Google Scholar] [CrossRef]
- Brockman, G.; Cheung, V.; Pettersson, L.; Schneider, J.; Schulman, J.; Tang, J.; Zaremba, W. OpenAI Gym. arXiv 2016, arXiv:1606.01540. [Google Scholar]
- Zhang, X.; Mao, W.; Mowlavi, S.; Benosman, M.; Başar, T. Controlgym: Large-Scale Control Environments for Benchmarking Reinforcement Learning Algorithms. arXiv 2023, arXiv:2311.18736. [Google Scholar] [CrossRef]
- Yoon, C. GitHub Repo: Policy-Gradient-Methods. Available online: https://github.com/cyoon1729/Policy-Gradient-Methods.git (accessed on 19 January 2025).
- Nath, A.; Oveisi, A.; Pal, A.K.; Nestorović, T. Exploring reward shaping in discrete and continuous action spaces: A deep reinforcement learning study on Turtlebot3. PAMM 2024, 24, e202400169. [Google Scholar] [CrossRef]
- Viswanadhapalli, J.K.; Elumalai, V.K.; Shivram, S.; Shah, S.; Mahajan, D. Deep reinforcement learning with reward shaping for tracking control and vibration suppression of flexible link manipulator. Appl. Soft Comput. 2024, 152, 110756. [Google Scholar] [CrossRef]
- Veviurko, G.; Böhmer, W.; de Weerdt, M. To the Max: Reinventing Reward in Reinforcement Learning. arXiv 2024, arXiv:2402.01361. [Google Scholar] [CrossRef]
- Dayal, A.; Cenkeramaddi, L.R.; Jha, A. Reward criteria impact on the performance of reinforcement learning agent for autonomous navigation. Appl. Soft Comput. 2022, 126, 109241. [Google Scholar] [CrossRef]
- Anaconda. Anaconda: The Operating System for AI. Available online: https://www.anaconda.com/ (accessed on 5 February 2025).
- Kochenderfer, M.J.; Wheeler, T.A. Algorithms for Optimization; The MIT Press: Boston, MA, USA, 2019. [Google Scholar]
- Gao, F.; Han, L. Implementing the Nelder-Mead simplex algorithm with adaptive parameters. Comput. Optim. Appl. 2012, 51, 259–277. [Google Scholar] [CrossRef]
- Fletcher, R. Practical Methods of Optimization; John Wiley & Sons, Ltd.: New York, NY, USA, 1987. [Google Scholar]
- Kraft, D.; Dfvlr, F.B. A Software Package for Sequential Quadratic Programming; DFVLR: Berlin, Germany, 1988. [Google Scholar]
- Xiang, Y.; Sun, D.Y.; Fan, W.; Gong, X.G. Generalized simulated annealing algorithm and its application to the Thomson model. Phys. Lett. A 1997, 233, 216–220. [Google Scholar] [CrossRef]
- Endres, S.C.; Sandrock, C.; Focke, W.W. A simplicial homology algorithm for Lipschitz optimisation. J. Glob. Optim. 2018, 72, 181–217. [Google Scholar] [CrossRef]
- Virtanen, P.; Gommers, R.; Oliphant, T.E.; Haberland, M.; Reddy, T.; Cournapeau, D.; Burovski, E.; Peterson, P.; Weckesser, W.; Bright, J.; et al. SciPy 1.0: Fundamental algorithms for scientific computing in Python. Nat. Methods 2020, 17, 261–272. [Google Scholar] [CrossRef] [PubMed]
- Fortela, D.L.B. aspenRL: Enhanced AspenPlus-based Multi-stage Distillation Design Using SAC Reinforcement Learning. Available online: https://github.com/dhanfort/aspenRL (accessed on 19 January 2025).
Action/State Variable | Description |
---|---|
) = TOP Downcomer clearance | Distance between the bottom edge of the downcomer and the tray below; value range: [30, 150]; units: mm |
) = TOP Tray spacing | Distance between two consecutive trays; value range: [1, 7]; units: ft |
) = TOP Weir height | Height of a tray outlet weir, which regulates the amount of liquid build-up on the plate surface; value range: [10, 150]; units: mm |
) = TOP Sieve hole diameter | Diameter of the holes on the sieve tray; value range: [5, 15]; units: mm |
) = TOP Weir side length | Length of the tray outlet weir; value range: [0.1, 1]; units: ft |
) = BOT Downcomer clearance | Distance between the bottom edge of the downcomer and the tray below; value range: [30, 150]; units: mm |
) = BOT Tray spacing | Distance between two consecutive trays; value range: [1, 7]; units: ft |
) = BOT Weir height | Height of a tray outlet weir, which regulates the amount of liquid build-up on the plate surface; value range: [10, 150]; units: mm |
) = BOT Sieve hole diameter | Diameter of the holes on the sieve tray; value range: [5, 15]; units: mm |
) = BOT Weir side length | Length of the tray outlet weir; value range: [0.1, 1]; units: ft |
) = Column diameter ** | Diameter of the column; value range: [4, 9]; units: ft |
; units: % | |
; units: % |
Reward Model | Definition |
---|---|
Scheme 1 | : |
Scheme 2 | : |
Scheme 3 | . : |
Scheme 4 | is similar to Scheme 3 but with reward values lower by a factor of 10: : |
Scheme 5 | The reward is built on a binary baseline scheme: : |
Reward Model | Column Section | ||||||
---|---|---|---|---|---|---|---|
Scheme 1 | TOP | 88.5 | 8.7 | 0.955 | 0.826 | 0.251 | 0 |
BOT | 88.8 | 8.6 | 0.945 | 0.796 | 0.216 | 0 | |
Scheme 2 | TOP | 87.9 | 7.1 | 0.963 | 0.837 | 0.293 | 0 |
BOT | 88.2 | 6.1 | 0.974 | 0.835 | 0.146 | 0 | |
Scheme 3 | TOP | 87.7 | 6.0 | 0.971 | 0.841 | 0.29 | 0 |
BOT | 88.0 | 5.6 | 0.976 | 0.823 | 0.214 | 0 | |
Scheme 4 | TOP | 93.8 | 14.6 | 0.851 | 0.587 | 0.111 | 0 |
BOT | 94.0 | 14.6 | 0.845 | 0.57 | 0.102 | 0 | |
Scheme 5 | TOP | 101.9 | 21.8 | 0.686 | 0.371 | 0.067 | 0 |
BOT | 102.4 | 22.0 | 0.686 | 0.359 | 0.052 | 0 |
SAC Setting | Hyperparameter | ||||||
---|---|---|---|---|---|---|---|
Replay Buffer Length | |||||||
1 | 0.2 | 0.05 | 0.99 | 50 | 0.945 | 85.3 | 4.0 |
2 | 0.2 | 0.05 | 0.9 | 50 | 0.918 | 85.9 | 3.6 |
3 | 0.5 | 0.05 | 0.9 | 100 | 0.881 | 86.8 | 5.3 |
4 | 0.2 | 0.05 | 0.9 | 100 | 0.886 | 86.8 | 6.3 |
5 | 0.2 | 0.05 | 0.99 | 100 | 0.826 | 88.5 | 8.7 |
6 | 0.2 | 0.01 | 0.99 | 50 | 0.978 | 84.8 | 2.1 |
7 | 0.5 | 0.01 | 0.9 | 50 | 0.865 | 86.8 | 5.2 |
8 | 0.1 | 0.01 | 0.9 | 50 | 0.881 | 86.7 | 5.1 |
9 | 0.2 | 0.01 | 0.9 | 50 | 0.938 | 85.4 | 3.2 |
10 | 0.5 | 0.01 | 0.9 | 100 | 0.916 | 86.5 | 5.3 |
11 | 0.1 | 0.01 | 0.9 | 100 | 0.947 | 85.8 | 3.2 |
12 | 0.2 | 0.01 | 0.9 | 100 | 0.932 | 86.4 | 6.1 |
13 | 0.2 | 0.01 | 0.99 | 100 | 0.870 | 87.5 | 7.7 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Fortela, D.L.B.; Broussard, H.; Ward, R.; Broussard, C.; Mikolajczyk, A.P.; Bayoumi, M.A.; Zappi, M.E. Soft Actor-Critic Reinforcement Learning Improves Distillation Column Internals Design Optimization. ChemEngineering 2025, 9, 34. https://doi.org/10.3390/chemengineering9020034
Fortela DLB, Broussard H, Ward R, Broussard C, Mikolajczyk AP, Bayoumi MA, Zappi ME. Soft Actor-Critic Reinforcement Learning Improves Distillation Column Internals Design Optimization. ChemEngineering. 2025; 9(2):34. https://doi.org/10.3390/chemengineering9020034
Chicago/Turabian StyleFortela, Dhan Lord B., Holden Broussard, Renee Ward, Carly Broussard, Ashley P. Mikolajczyk, Magdy A. Bayoumi, and Mark E. Zappi. 2025. "Soft Actor-Critic Reinforcement Learning Improves Distillation Column Internals Design Optimization" ChemEngineering 9, no. 2: 34. https://doi.org/10.3390/chemengineering9020034
APA StyleFortela, D. L. B., Broussard, H., Ward, R., Broussard, C., Mikolajczyk, A. P., Bayoumi, M. A., & Zappi, M. E. (2025). Soft Actor-Critic Reinforcement Learning Improves Distillation Column Internals Design Optimization. ChemEngineering, 9(2), 34. https://doi.org/10.3390/chemengineering9020034