Vibration Control with Reinforcement Learning Based on Multi-Reward Lightweight Networks
Abstract
:1. Introduction
2. Problem Formulation
2.1. Dynamic Model
2.2. Active Control Implementation
3. Method
3.1. Vibration System Simulator
3.1.1. FIR Filter
3.1.2. Using FIR Filters to the Build Simulator
3.2. Algorithm Framework
3.3. Actor–Critic (AC) Network
3.3.1. Classic ResNet Architecture
3.3.2. The Basic Architecture Design of DRSNs
- (1)
- Theoretical background
- (2)
- Developed DRSN-CS framework
- (3)
- Developed DRSN-CW framework
- (4)
- Detailed Network Architecture of the Actor
3.4. Multi-Reward Mechanism
3.5. Priority Experience Replaying
4. Experimental Results and Discussion
4.1. Lightweight Comparative Experiment
4.2. Comparison between Prioritized Experience Replaying and Non-Prioritized Experience Replaying
4.3. Optimal Reward Function Combination
4.4. Demonstrating the Feasibility of This Lightweight Method
4.5. Alleviating Overestimation Experiments
4.6. Relevant Comparative Experiments
4.6.1. Relevant Comparative Experiments for Lightweightness
4.6.2. Relevant Comparative Experiments for Prioritized Experience Replay
4.6.3. Relevant Comparative Experiments for the Optimal Reward Functions
4.6.4. Ablation Experiments
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Ab Talib, M.H.; Mat Darus, I.Z.; Mohd Samin, P.; Mohd Yatim, H.; Ardani, M.I.; Shaharuddin, N.M.R.; Hadi, M.S. Vibration control of semi-active suspension system using PID controller with advanced firefly algorithm and particle swarm optimization. J. Ambient. Intell. Humaniz. Comput. 2021, 12, 1119–1137. [Google Scholar] [CrossRef]
- Li, W.; Yang, Z.; Li, K.; Wang, W. Hybrid feedback PID-FxLMS algorithm for active vibration control of cantilever beam with piezoelectric stack actuator. J. Sound Vib. 2021, 509, 116243. [Google Scholar] [CrossRef]
- Wang, L.; Liu, J.; Yang, C.; Wu, D. A novel interval dynamic reliability computation approach for the risk evaluation of vibration active control systems based on PID controllers. Appl. Math. Model. 2021, 92, 422–446. [Google Scholar] [CrossRef]
- Zhang, Q.; Yang, Z.; Wang, C.; Yang, Y.; Zhang, R. Intelligent control of active shock absorber for high-speed elevator car. Proc. Inst. Mech. Eng. Part C J. Mech. Eng. Sci. 2019, 233, 3804–3815. [Google Scholar] [CrossRef]
- Tian, J.; Guo, Q.; Shi, G. Laminated piezoelectric beam element for dynamic analysis of piezolaminated smart beams and GA-based LQR active vibration control. Compos. Struct. 2020, 252, 112480. [Google Scholar] [CrossRef]
- Takeshita, A.; Yamashita, T.; Kawaguchi, N.; Kuroda, M. Fractional-order LQR and state observer for a fractional-order vibratory system. Appl. Sci. 2021, 11, 3252. [Google Scholar] [CrossRef]
- Lu, X.; Liao, W.; Huang, W.; Xu, Y.; Chen, X. An improved linear quadratic regulator control method through convolutional neural network–based vibration identification. J. Vib. Control. 2021, 27, 839–853. [Google Scholar] [CrossRef]
- Niu, W.; Zou, C.; Li, B.; Wang, W. Adaptive vibration suppression of time-varying structures with enhanced FxLMS algorithm. Mech. Syst. Signal Process. 2019, 118, 93–107. [Google Scholar] [CrossRef]
- Puri, A.; Modak, S.V.; Gupta, K. Modal filtered-x LMS algorithm for global active noise control in a vibro-acoustic cavity. Mech. Syst. Signal Process. 2018, 110, 540–555. [Google Scholar] [CrossRef]
- Seba, B.; Nedeljkovic, N.; Paschedag, J.; Lohmann, B. H∞ Feedback control and Fx-LMS feedforward control for car engine vibration attenuation. Appl. Acoust. 2005, 66, 277–296. [Google Scholar] [CrossRef]
- Carlucho, I.; de Paula, M.; Wang, S.; Petillot, Y.; Acosta, G. Adaptive low-level control of autonomous underwater vehicles using deep reinforcement learning. Robot. Auton. Syst. 2018, 107, 71–86. [Google Scholar] [CrossRef]
- Pane, Y.P.; Nageshrao, S.P.; Kober, J.; Babuška, R. Reinforcement learning based compensation methods for robot manipulators. Eng. Appl. Artif. Intell. 2019, 78, 236–247. [Google Scholar] [CrossRef]
- Silver, D.; Hubert, T.; Schrittwieser, J.; Antonoglou, I.; Lai, M.; Guez, A.; Lanctot, M.; Sifre, L.; Kumaran, D.; Graepel, T.; et al. A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play. Science 2018, 362, 1140–1144. [Google Scholar] [CrossRef]
- Ye, D.; Liu, Z.; Sun, M.; Shi, B.; Zhao, P.; Wu, H.; Yu, H.; Yang, S.; Wu, X.; Guo, Q.; et al. Mastering complex control in MOBA games with deep reinforcement learning. arXiv 2020, arXiv:1912.09729. [Google Scholar] [CrossRef]
- Degrave, J.; Felici, F.; Buchli, J.; Neunert, M.; Tracey, B.; Carpanese, F.; Ewalds, T.; Hafner, R.; Abdolmaleki, A.; de las Casas, D.; et al. Magnetic control of tokamak plasmas through deep reinforcement learning. Nature 2022, 602, 414–419. [Google Scholar] [CrossRef] [PubMed]
- Li, Z.; Chu, T.; Kalabić, U. Dynamics-enabled safe deep reinforcement learning: Case study on active suspension control. In Proceedings of the 2019 IEEE Conference on Control Technology and Applications (CCTA), Hong Kong, China, 19–21 August 2019; pp. 585–591. [Google Scholar]
- Zhao, F.; You, K.; Song, S.; Zhang, W.; Tong, L. Suspension regulation of medium-low-speed maglev trains via deep reinforcement learning. IEEE Trans. Artif. Intell. 2021, 2, 341–351. [Google Scholar] [CrossRef]
- Ding, Z.; Song, C.; Xu, J.; Dou, Y. Human-Robot Interaction System Design for Manipulator Control Using Reinforcement Learning. In Proceedings of the 2021 36th Youth Academic Annual Conference of Chinese Association of Automation (YAC), Nanchang, China, 28–30 May 2021; pp. 660–665. [Google Scholar]
- Baselizadeh, A.; Khaksar, W.; Torresen, J. Motion Planning and Obstacle Avoidance for Robot Manipulators Using Model Predictive Control-based Reinforcement Learning. In Proceedings of the 2022 IEEE International Conference on Systems, Man, and Cybernetics (SMC), Prague, Czech Republic, 9–12 October 2022; pp. 1584–1591. [Google Scholar]
- Vatsal, V.; Purushothaman, B. Reinforcement Learning of Whole-Body Control Strategies to Balance a Dynamically Stable Mobile Manipulator. In Proceedings of the 2021 Seventh Indian Control Conference (ICC), Virtually, 20–22 December 2021; pp. 335–340. [Google Scholar]
- Park, J.-E.; Lee, J.; Kim, Y.-K. Design of model-free reinforcement learning control for tunable vibration absorber system based on magnetorheological elastomer. Smart Mater. Struct. 2021, 30, 055016. [Google Scholar] [CrossRef]
- Yuan, R.; Yang, Y.; Su, C.; Hu, S.; Zhang, H.; Cao, E. Research on vibration reduction control based on reinforcement learning. Adv. Civ. Eng. 2021, 2021, 7619214. [Google Scholar] [CrossRef]
- Qiu, Z.; Chen, G.; Zhang, X. Trajectory planning and vibration control of translation flexible hinged plate based on optimization and reinforcement learning algorithm. Mech. Syst. Signal Process. 2022, 179, 109362. [Google Scholar] [CrossRef]
- Qiu, Z.; Chen, G.; Zhang, X. Reinforcement learning vibration control for a flexible hinged plate. Aerosp. Sci. Technol. 2021, 118, 107056. [Google Scholar] [CrossRef]
- Qiu, Z.; Yang, Y.; Zhang, X. Reinforcement learning vibration control of a multi-flexible beam coupling system. Aerosp. Sci. Technol. 2022, 129, 107801. [Google Scholar] [CrossRef]
- Feng, X.; Chen, H.; Wu, G.; Zhang, A.; Zhao, Z. A New Vibration Controller Design Method Using Reinforcement Learning and FIR Filters: A Numerical and Experimental Study. Appl. Sci. 2022, 12, 9869. [Google Scholar] [CrossRef]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv 2017, arXiv:1704.04861. [Google Scholar]
- Zhang, X.; Zhou, X.; Lin, M.; Sun, J. Shufflenet: An extremely efficient convolutional neural network for mobile devices. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 6848–6856. [Google Scholar]
- Zhao, M.; Zhong, S.; Fu, X.; Tang, B.; Pecht, M. Deep residual shrinkage networks for fault diagnosis. IEEE Trans. Ind. Inform. 2019, 16, 4681–4690. [Google Scholar] [CrossRef]
- Sutton, R.S.; McAllester, D.; Singh, S.; Mansour, Y. Policy gradient methods for reinforcement learning with function approximation. Adv. Neural Inf. Process. Syst. 1999, 12, 1057–1063. [Google Scholar]
- Watkins, C.J.; Dayan, P. Q-learning. Mach. Learn. 1992, 8, 279–292. [Google Scholar] [CrossRef]
- Konda, V.; Tsitsiklis, J. Actor-critic algorithms. Adv. Neural Inf. Process. Syst. 1999, 13, 1008–1014. [Google Scholar]
- Lillicrap, T.P.; Hunt, J.J.; Pritzel, A.; Heess, N.; Erez, T.; Tassa, Y.; Silver, D.; Wierstra, D. Continuous control with deep reinforcement learning. arXiv 2015, arXiv:1509.02971. [Google Scholar]
- Haarnoja, T.; Zhou, A.; Hartikainen, K.; Tucker, G.; Ha, S.; Tan, J.; Kumar, V.; Zhu, H.; Gupta, A.; Abbeel, P. Soft actor-critic algorithms and applications. arXiv 2018, arXiv:1812.05905. [Google Scholar]
- Fujimoto, S.; Hoof, H.; Meger, D. Addressing function approximation error in actor-critic methods. In Proceedings of the International Conference on Machine Learning, Vienna, Austria, 10–15 July 2018; pp. 1587–1596. [Google Scholar]
- Hayes, M.H. Statistical Digital Signal Processing and Modeling; John Wiley & Sons: Hoboken, NJ, USA, 1996. [Google Scholar]
- Donoho, D.L. De-noising by soft-thresholding. IEEE Trans. Inf. Theory 1995, 41, 613–627. [Google Scholar] [CrossRef]
- Isogawa, K.; Ida, T.; Shiodera, T.; Takeguchi, T. Deep shrinkage convolutional neural network for adaptive noise reduction. IEEE Signal Process. Lett. 2017, 25, 224–228. [Google Scholar] [CrossRef]
- Schaul, T.; Quan, J.; Antonoglou, I.; Silver, D. Prioritized experience replay. arXiv 2015, arXiv:1511.05952. [Google Scholar]
Algorithm | Uncontrolled (V) | Controlled (V) | Parameter (N) | Reduction (dB) |
---|---|---|---|---|
FC | 0.3311 | 0.0762 | 507398 | 12.750 |
CNN | 0.3311 | 0.0870 | 3102254 | 11.604 |
GoogleNet | 0.3311 | 0.0812 | 15774938 | 11.894 |
LSTM | 0.3311 | 0.0875 | 375326 | 11.516 |
DRSL-MPER | 0.3311 | 0.0328 | 67554 | 20.240 |
V, MSV, RMS, OM | Reduction (dB) |
---|---|
V | 15.988 |
MSV | 16.032 |
RMS | 16.284 |
OM | 13.722 |
V, MSV | 3.496 |
V, RMS | 16.094 |
V, OM | 12.062 |
MSV, RMS | 20.240 |
MSV, OM | 3.518 |
RMS, OM | 9.238 |
V, MSV, RMS | 16.402 |
V, MSV, OM | 15.598 |
V, RMS, OM | 2.234 |
MSV, RMS, OM | 15.074 |
V, MSV, RMS, OM | 13.858 |
Actor | Critic | RL | Parameter (N) | Reduction (dB) |
---|---|---|---|---|
FC256 | FC256 | DDPG | 338437 | 12.722 |
DRSN | FC256 | DDPG | 191545 | 13.102 |
FC256 | DRSN | DDPG | 191961 | 14.727 |
DRSN | DRSN | DDPG | 45069 | 12.966 |
FC256 | FC256 | TD3 | 507398 | 14.276 |
DRSN | FC256 | TD3 | 360506 | 16.196 |
FC256 | DRSN | TD3 | 214446 | 13.014 |
DRSN | DRSN | TD3 | 67554 | 20.240 |
V, MSV, RMS, OM | Reduction (dB) |
---|---|
V | 10.458 |
MSV | 11.990 |
RMS | 13.506 |
OM | 9.156 |
V, MSV | 1.938 |
V, RMS | 13.386 |
V, OM | 6.504 |
MSV, RMS | 13.524 |
MSV, OM | 2.256 |
RMS, OM | 5.43 |
V, MSV, RMS | 13.096 |
V, MSV, OM | 11.508 |
V, RMS, OM | 1.284 |
MSV, RMS, OM | 10.526 |
V, MSV, RMS, OM | 9.318 |
Algorithm | Average Convergence Episodes | Parameter | Reduction (dB) |
---|---|---|---|
Baseline DDPG (FC + ROM) | 2031 | 338437 | 12.728 |
DDPG (DRSN + ROM) | 1562 | 45069 | 15.328 |
DDPG (DRSN + RMSV+RMS) | 1468 | 45069 | 16.210 |
DDPG (DRSN + RMSV+RMS + PER) | 634 | 45069 | 18.108 |
DRSL-MPER (TD3 + DRSN + RMSV+RMS + PER) | 652 | 67554 | 20.240 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Shu, Y.; He, C.; Qiao, L.; Xiao, B.; Li, W. Vibration Control with Reinforcement Learning Based on Multi-Reward Lightweight Networks. Appl. Sci. 2024, 14, 3853. https://doi.org/10.3390/app14093853
Shu Y, He C, Qiao L, Xiao B, Li W. Vibration Control with Reinforcement Learning Based on Multi-Reward Lightweight Networks. Applied Sciences. 2024; 14(9):3853. https://doi.org/10.3390/app14093853
Chicago/Turabian StyleShu, Yucheng, Chaogang He, Lihong Qiao, Bin Xiao, and Weisheng Li. 2024. "Vibration Control with Reinforcement Learning Based on Multi-Reward Lightweight Networks" Applied Sciences 14, no. 9: 3853. https://doi.org/10.3390/app14093853