Real-Time Online Goal Recognition in Continuous Domains via Deep Reinforcement Learning
Abstract
:1. Introduction
2. Background
2.1. Goal Recognition Problem
2.2. Goal Recognition as Planning
2.3. Goal Recognition as Learning
3. The Goal Recognition as Deep Reinforcement Learning Framework
Algorithm 1 Offline Training for Domain Theory |
Require: , : State and action spaces in the continuous domain Require: : A map used for navigation tasks in the continuous domain
|
Algorithm 2 Online Infer most likely Goal for the Observations |
Require: : State and action spaces in the continuous domain, and policy evaluation networks Require: : a set of candidate goals Require: : an observation sequence
|
4. Goal Recognition as TD3
4.1. Basic Principles of the TD3 Algorithm
4.2. Parameter Design of the TD3 Algorithm’s Basic Structure
5. Experiment Evaluation
5.1. Offline Training of the TD3 Algorithm in ROS
5.2. Testing in a Continuous Domain
5.2.1. Testing under Partial Observability
5.2.2. Testing under Observation Sequence Noise
5.2.3. Online Recognition Speed Testing
6. Discussion and Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
GR_DRL | Goal Recognition as Deep Reinforcement Learning |
GR | Goal Recognition |
PRAP | Plan Recognition as Planning |
PDDL | Planning Domain Definition Language |
XGBoost | eXtreme Gradient Boosting |
LSTM | Long Short-Term Memory networks |
TD3 | Twin Delayed Deep Deterministic Policy Gradient |
DDPG | Deep Deterministic Policy Gradient |
FC | Fully Connected |
ReLU | Rectified Linear Unit |
CFC | Combined Fully Connected Layer |
SNR | Signal-to-Noise Ratio |
References
- Sukthankar, G.; Goldman, R.; Geib, C.; Pynadath, D.; Bui, H. Plan, Activity, and Intent Recognition: Theory and Practice; Elsevier: Amsterdam, The Netherlands, 2014; pp. 1–385. [Google Scholar]
- Geib, C.W. Problems with Intent Recognition for Elder Care. In Proceedings of the AAAI-02 Workshop “Automation as Caregiver”, Edmonton, AB, Canada, 29 July 2002; pp. 13–17. [Google Scholar]
- Granada, R.; Pereira, R.F.; Monteiro, J.; Barros, R.; Ruiz, D.; Meneguzzi, F. Hybrid Activity and Plan Recognition for Video Streams. In Proceedings of the 31st AAAI Conference: Plan, Activity and Intent Recognition Workshop, San Francisco, CA, USA, 4–9 February 2017. [Google Scholar]
- Brewitt, C.; Gyevnar, B.; Garcin, S.; Albrecht, S.V. GRIT: Fast, Interpretable, and Verifiable Goal Recognition with Learned Decision Trees for Autonomous Driving. In Proceedings of the 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Prague, Czech Republic, 27 September–1 October 2021. [Google Scholar]
- Brewitt, C.; Tamborski, M.; Albrecht, S.V. Verifiable Goal Recognition for Autonomous Driving with Occlusions. arXiv 2022, arXiv:2206.14163. [Google Scholar]
- Xu, K.; Yin, Q. Goal Identification Control Using an Information Entropy-Based Goal Uncertainty Metric. Entropy 2019, 21, 299. [Google Scholar] [CrossRef] [PubMed]
- Sohrabi, S.; Riabov, A.V.; Udrea, O. Plan Recognition as Planning Revisited. In Proceedings of the 25th International Joint Conference on Artificial Intelligence (IJCAI), New York, NY, USA, 9–15 July 2016; pp. 3258–3264. [Google Scholar]
- Fitzpatrick, G.; Lipovetzky, N.; Papasimeon, M.; Ramirez, M.; Vered, M. Behaviour Recognition with Kinodynamic Planning Over Continuous Domains. Front. Artif. Intell. 2021, 4, 717003. [Google Scholar] [CrossRef] [PubMed]
- Wayllace, C.; Ha, S.; Han, Y.; Hu, J.; Monadjemi, S.; Yeoh, W.; Ottley, A. DRAGON-V: Detection and Recognition of Airplane Goals with Navigational Visualization. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 13642–13643. [Google Scholar] [CrossRef]
- Ramirez, M.; Geffner, H. Plan Recognition as Planning. In Proceedings of the 21st International Joint Conference on Artificial Intelligence, Pasadena, CA, USA, 11–17 July 2009; p. 6. [Google Scholar]
- Amado, L.; Mirsky, R.; Meneguzzi, F. Goal Recognition as Reinforcement Learning. arXiv 2022, arXiv:2202.06356. [Google Scholar] [CrossRef]
- Silver, T.; Chitnis, R. PDDLGym: Gym Environments from PDDL Problems. arXiv 2020, arXiv:2002.06432. [Google Scholar]
- Vered, M.; Kaminka, G.A. Online Recognition of Navigation Goals Through Goal Mirroring (Extended Abstract). In Proceedings of the 16th Conference on Autonomous Agents and Multiagent Systems, São Paulo, Brazil, 8–12 May 2017; p. 3. [Google Scholar]
- Vered, M.; Kaminka, G.A. Heuristic Online Goal Recognition in Continuous Domains. arXiv 2017, arXiv:1709.09839. [Google Scholar]
- Meneguzzi, F.; Fraga Pereira, R. A Survey on Goal Recognition as Planning. In Proceedings of the 30th International Joint Conference on Artificial Intelligence, Montreal, QC, Canada, 19–26 August 2021; pp. 4524–4532. [Google Scholar] [CrossRef]
- Van-Horenbeke, F.A.; Peer, A. Activity, Plan, and Goal Recognition: A Review. Front. Robot. AI 2021, 8, 643010. [Google Scholar] [CrossRef] [PubMed]
- Mirsky, R.; Keren, S.; Geib, C. Introduction to Symbolic Plan and Goal Recognition; Morgan & Claypool Publishers: San Rafael, CA, USA, 2021; pp. 1–100. [Google Scholar]
- Pereira, R.F.; Oren, N.; Meneguzzi, F. Landmark-Based Approaches for Goal Recognition as Planning. Artif. Intell. 2020, 279, 103217. [Google Scholar] [CrossRef]
- Amado, L.; Paludo Licks, G.; Marcon, M.; Fraga Pereira, R.; Meneguzzi, F. Using Self-Attention LSTMs to Enhance Observations in Goal Recognition, Glasgow, United Kingdom. In Proceedings of the 2020 International Joint Conference on Neural Networks (IJCNN), Glasgow, UK, 19–24 July 2020; pp. 1–8. [Google Scholar] [CrossRef]
- Amado, L.; Pereira, R.F.; Aires, J.; Magnaguagno, M.; Granada, R.; Meneguzzi, F. Goal Recognition in Latent Space. In Proceedings of the 2018 International Joint Conference on Neural Networks (IJCNN), Rio de Janeiro, Brazil, 8–13 July 2018; pp. 1–8. [Google Scholar] [CrossRef]
- Zhi-Xuan, T.; Mann, J.; Silver, T.; Tenenbaum, J.; Mansinghka, V. Online Bayesian Goal Inference for Boundedly Rational Planning Agents; Curran Associates, Inc.: Dutchess County, NY, USA, 2020; Volume 33, pp. 19238–19250. [Google Scholar]
- Masters, P.; Sardina, S. Cost-Based Goal Recognition for Path-Planning. In Proceedings of the 16th Conference on Autonomous Agents and Multiagent Systems, São Paulo, Brazil, 8–12 May 2017; p. 9. [Google Scholar]
- Oh, J.; Meneguzzi, F.; Sycara, K.; Norman, T.J. An Agent Architecture for Prognostic Normative Reasoning. In Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence, Barcelona, Spain, 16–22 July 2011. [Google Scholar]
- Avrahami-Zilberbrand, D.; Kaminka, G.A. Keyhole Adversarial Plan Recognition for Recognition of Suspicious and Anomalous Behavior; Elsevier: Amsterdam, The Netherlands, 2014; pp. 87–119. [Google Scholar] [CrossRef]
- Xu, K.; Zeng, Y.; Qin, L.; Yin, Q. Single Real Goal, Magnitude-Based Deceptive Path-Planning. Entropy 2020, 22, 88. [Google Scholar] [CrossRef] [PubMed]
- Kaminka, G.; Vered, M.; Agmon, N. Plan Recognition in Continuous Domains. In Proceedings of the AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018; Volume 32. [Google Scholar] [CrossRef]
- Borrajo, D.; Gopalakrishnan, S.; Potluru, V.K. Goal Recognition via Model-Based and Model-Free Techniques. arXiv 2020, arXiv:2011.01832. [Google Scholar]
- Masters, P.; Vered, M. What’s the Context? Implicit and Explicit Assumptions in Model-Based Goal Recognition, Montreal, Canada. In Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, IJCAI 2021, Montreal, QC, Canada, 19–27 August 2021; pp. 4516–4523. [Google Scholar] [CrossRef]
- Amir, E.; Chang, A. Learning Partially Observable Deterministic Action Models. J. Artif. Intell. Res. 2008, 33, 349–402. [Google Scholar] [CrossRef]
- Asai, M.; Muise, C. Learning Neural-Symbolic Descriptive Planning Models via Cube-Space Priors: The Voyage Home (to STRIPS). arXiv 2020, arXiv:2004.12850. [Google Scholar]
- Juba, B.; Le, H.S.; Stern, R. Safe Learning of Lifted Action Models. arXiv 2021, arXiv:2107.04169. [Google Scholar]
- Zeng, Y.; Xu, K.; Yin, Q.; Qin, L.; Zha, Y.; Yeoh, W. Inverse Reinforcement Learning Based Human Behavior Modeling for Goal Recognition in Dynamic Local Network Interdiction. In Proceedings of the AAAI Workshops, New Orleans, LA, USA, 2–7 February 2018; p. 7. [Google Scholar]
- Durga, K.M.L.; Jyotsna, P.; Kumar, G.K. A Deep Learning based Human Activity Recognition Model using Long Short Term Memory Networks. In Proceedings of the 2022 International Conference on Sustainable Computing and Data Communication Systems (ICSCDS), Erode, India, 7–9 April 2022; pp. 1371–1376. [Google Scholar] [CrossRef]
- Chiari, M.; Gerevini, A.E.; Putelli, L.; Percassi, F.; Serina, I. Goal Recognition as a Deep Learning Task: The GRNet Approach. arXiv 2022, arXiv:2210.02377. [Google Scholar] [CrossRef]
Partial OBS | acc | prec | rec | f-s | |||||
---|---|---|---|---|---|---|---|---|---|
Type | Dynamic | Statics | Dynamic | Statics | Dynamic | Statics | Dynamic | Statics | |
5% | sampling rates | 0.90 | 0.94 | 0.74 | 0.86 | 0.74 | 0.86 | 0.74 | 0.86 |
10% | sampling rates | 0.92 | 0.96 | 0.79 | 0.89 | 0.79 | 0.89 | 0.79 | 0.89 |
30% | sampling rates | 0.92 | 0.96 | 0.79 | 0.89 | 0.79 | 0.89 | 0.79 | 0.89 |
50% | sampling rates | 0.92 | 0.96 | 0.79 | 0.89 | 0.79 | 0.89 | 0.79 | 0.89 |
100% | sampling rates | 0.94 | 0.96 | 0.84 | 0.89 | 0.84 | 0.89 | 0.84 | 0.89 |
avg | 0.92 | 0.95 | 0.79 | 0.88 | 0.79 | 0.88 | 0.79 | 0.88 | |
5% | sensor failures | 0.82 | 0.83 | 0.56 | 0.58 | 0.58 | 0.60 | 0.57 | 0.59 |
10% | sensor failures | 0.86 | 0.89 | 0.66 | 0.72 | 0.66 | 0.72 | 0.66 | 0.72 |
30% | sensor failures | 0.87 | 0.90 | 0.68 | 0.76 | 0.68 | 0.76 | 0.68 | 0.76 |
50% | sensor failures | 0.87 | 0.90 | 0.68 | 0.76 | 0.68 | 0.76 | 0.68 | 0.76 |
100% | sensor failures | 0.92 | 0.96 | 0.79 | 0.89 | 0.79 | 0.89 | 0.79 | 0.89 |
avg | 0.87 | 0.90 | 0.67 | 0.74 | 0.68 | 0.75 | 0.68 | 0.74 |
Noise OBS | acc | prec | rec | f-s | ||||||
---|---|---|---|---|---|---|---|---|---|---|
Type | SNR | Dynamic | Statics | Dynamic | Statics | Dynamic | Statics | Dynamic | Statics | |
Gaussian noise | full obs | 10 dB | 0.92 | 0.96 | 0.81 | 0.90 | 0.81 | 0.90 | 0.81 | 0.90 |
50 dB | 0.93 | 0.97 | 0.82 | 0.93 | 0.82 | 0.93 | 0.82 | 0.93 | ||
80 dB | 0.93 | 0.97 | 0.82 | 0.93 | 0.82 | 0.93 | 0.82 | 0.93 | ||
Gaussian noise | 50% obs | 10 dB | 0.88 | 0.92 | 0.69 | 0.80 | 0.69 | 0.80 | 0.69 | 0.80 |
50 dB | 0.92 | 0.97 | 0.81 | 0.92 | 0.81 | 0.92 | 0.81 | 0.92 | ||
80 dB | 0.92 | 0.97 | 0.81 | 0.93 | 0.81 | 0.93 | 0.81 | 0.93 | ||
Laplace noise | full obs | 10 dB | 0.83 | 0.92 | 0.57 | 0.80 | 0.57 | 0.80 | 0.57 | 0.80 |
50 dB | 0.84 | 0.94 | 0.60 | 0.84 | 0.60 | 0.84 | 0.60 | 0.84 | ||
80 dB | 0.86 | 0.96 | 0.64 | 0.90 | 0.64 | 0.90 | 0.64 | 0.90 | ||
Laplace noise | 50% obs | 10 dB | 0.78 | 0.88 | 0.46 | 0.71 | 0.46 | 0.71 | 0.46 | 0.71 |
50 dB | 0.82 | 0.91 | 0.54 | 0.77 | 0.54 | 0.77 | 0.54 | 0.77 | ||
80 dB | 0.84 | 0.94 | 0.59 | 0.84 | 0.59 | 0.84 | 0.59 | 0.84 | ||
Poisson noise | full obs | 10 dB | 0.86 | 0.95 | 0.64 | 0.87 | 0.64 | 0.87 | 0.64 | 0.87 |
50 dB | 0.93 | 0.96 | 0.82 | 0.90 | 0.82 | 0.90 | 0.82 | 0.90 | ||
80 dB | 0.94 | 0.97 | 0.84 | 0.93 | 0.84 | 0.93 | 0.84 | 0.93 | ||
Poisson noise | 50% obs | 10 dB | 0.61 | 0.60 | 0.02 | 0.00 | 0.02 | 0.00 | 0.02 | 0.00 |
50 dB | 0.92 | 0.60 | 0.81 | 0.00 | 0.81 | 0.00 | 0.81 | 0.00 | ||
80 dB | 0.93 | 0.95 | 0.82 | 0.87 | 0.82 | 0.87 | 0.82 | 0.87 |
Online Recognition | acc | Time(s) | ||
---|---|---|---|---|
Dynamic | Statics | Dynamic | Statics | |
5% | 0.90 | 0.94 | 10.68 | 10.17 |
10% | 0.92 | 0.96 | 10.55 | 9.90 |
30% | 0.92 | 0.96 | 10.54 | 9.73 |
50% | 0.92 | 0.96 | 10.53 | 9.67 |
100% | 0.94 | 0.96 | 10.51 | 9.66 |
avg | 0.92 | 0.95 | 10.56 | 9.83 |
Gaussian Noise | acc | Time(s) | |||
---|---|---|---|---|---|
Partial OBS | Dynamic | Statics | Dynamic | Statics | |
10 db | full obs | 0.92 | 0.96 | 11.00 | 10.30 |
50 db | 0.93 | 0.97 | 10.86 | 10.12 | |
80 db | 0.93 | 0.97 | 10.63 | 10.00 | |
avg | 0.93 | 0.97 | 10.83 | 10.14 | |
10 db | 50% obs | 0.92 | 0.96 | 11.18 | 10.83 |
50 db | 0.93 | 0.97 | 11.17 | 10.72 | |
80 db | 0.93 | 0.97 | 11.07 | 10.72 | |
avg | 0.93 | 0.97 | 11.14 | 10.76 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Fang, Z.; Chen, D.; Zeng, Y.; Wang, T.; Xu, K. Real-Time Online Goal Recognition in Continuous Domains via Deep Reinforcement Learning. Entropy 2023, 25, 1415. https://doi.org/10.3390/e25101415
Fang Z, Chen D, Zeng Y, Wang T, Xu K. Real-Time Online Goal Recognition in Continuous Domains via Deep Reinforcement Learning. Entropy. 2023; 25(10):1415. https://doi.org/10.3390/e25101415
Chicago/Turabian StyleFang, Zihao, Dejun Chen, Yunxiu Zeng, Tao Wang, and Kai Xu. 2023. "Real-Time Online Goal Recognition in Continuous Domains via Deep Reinforcement Learning" Entropy 25, no. 10: 1415. https://doi.org/10.3390/e25101415
APA StyleFang, Z., Chen, D., Zeng, Y., Wang, T., & Xu, K. (2023). Real-Time Online Goal Recognition in Continuous Domains via Deep Reinforcement Learning. Entropy, 25(10), 1415. https://doi.org/10.3390/e25101415