A Bio-Inspired Dopamine Model for Robots with Autonomous Decision-Making
Abstract
:1. Introduction
2. Background
2.1. Dopamine Role in the Human Brain
2.2. Biologically Inspired Dopamine Models for Autonomous Agents
2.3. Autonomous and Adaptive Behaviour in Social Robotics
3. Bio-Inspired Dopamine Model for Autonomous Robots
3.1. Mini Social Robot
3.2. Software Architecture
3.2.1. Perception Manager
3.2.2. HRI System
- Human–robot interaction (HRI) manager: Controls the communication between the robot and the user. This manager organises the interaction with the user, avoiding conflicts that may occur if several actions/modules want to use the same resources. This module mainly communicates with the actions and operates between the perception manager and the expression manager.
- Expression manager: Sequences the robot actuators depending on the expressions the HRI manager sends. It manages the commands sent to each actuator to ensure the robot executes the requested expressions. Moreover, it receives spontaneous expressions via the Liveliness manager to generate lively behaviour.
3.2.3. User-Adaptive System
3.2.4. Liveliness Manager
3.2.5. Actions
3.3. Decision-Making System with Dopamine Regulation
- Sleep: This process has been selected to regulate the robot’s activity/inactivity, avoiding fatiguing the user with continuous interaction.
- Feeding: This process regulates the hunger of the robot and has been selected to involve the user in the interaction taking care of the robot.
- Moisturising: Similar to feeding, this process involves the user in the interaction by fostering the user taking care of Mini.
- Entertainment: This process regulates user-robot interactions by playing different games.
- Sleep: Mini closes its eyes, enters a relaxed mode without moving too much, and occasionally snores by performing different sounds. The action is executed if it perceives the light is off using a photosensor.
- Eat: Mini plays sounds simulating it is chewing food. The action is executed if it detects a broccoli object using an RFID reader.
- Drink: Mini plays sounds simulating it is drinking. The action can only be executed if the robot detects a water object using an RFID reader.
- Play: The robot performs different entertainment activities that are randomly chosen. Playing is only possible if the robot detects the user in front of it using a camera.
4. Results
4.1. Experiments Setup and Conditions
4.2. Anticipation
4.3. Dynamic Stimuli and Rewards
4.4. Learning to Avoid Negative Situations
4.5. Learning Autonomous Action Selection
4.6. Integration into the Robot
5. Limitations
- Time discretisation: The model by Bogacz [6] faces challenges in learning due to the discretisation of time into states. Variations in dopamine generation occur when rewards appear between microstates, stemming from the model’s reliance on a single value per microstate in the primary gain matrix.
- Computational resources: Dopamine models presented in Section 2 with an accurate representation of dopamine secretion in the human brain require complex equations that need moderate computational resources. This limitation can be stressed if more stimuli and biological processes act in the model since the number of combinations the robot has to learn significantly increases. Consequently, computational resources can be problematic if the model increases in size.
- Empirical parameters: The model proposed in this paper depends on many parameters that affect its performance. We have selected their values based on the original models and an empirical evaluation conducted to obtain a specific robot behaviour that prioritises some processes (e.g., feeding) above others (e.g., sleep). However, factors such as the time constant, agent reaction speed influence, and reward-related parameters strongly impact the resulting robot behaviour, and more tests comparing different configurations might contribute to a better understanding of the model dynamics.
- User evaluation: Robots work to assist people in many different applications. Consequently, robot users must test these systems before we deploy them in real scenarios. Mini is a robot dedicated to older adults’ healthcare, acting as a system for conducting cognitive stimulation activities while entertaining the user. The biologically inspired model shapes four biological processes intended for the user to take care of the robot, so evaluating how people perceive Mini’s behaviour in terms of diversity or enjoyment would be of interest to continue exploring this research line.
- Ethical considerations: The development of bio-inspired decision-making systems in robots designed for human–robot interaction yields several ethical considerations that must be addressed. These systems, such as those modelled after dopamine-driven behaviours, may unintentionally replicate human biases, leading to unfair or discriminatory decisions. Additionally, collecting personal data to enhance robot behaviour raises significant privacy concerns, necessitating robust data protection and transparency. Moreover, as robots become more human-like in their decision-making, the nature of human–robot interactions could change, potentially blurring the distinction between machines and living beings. This could lead to ethical challenges related to autonomy, consent, and the treatment of robots. Therefore, it is crucial to carefully consider these factors to ensure that bio-inspired robots enhance human–robot interactions fairly, transparently, and ethically soundly.
6. Future Work
- Model dynamics: Further exploration of dynamic parameter determination based on real-time factors could refine the model’s responsiveness, especially concerning the simplification of dopamine secretion dynamics. The scenario we propose consists of configuring different parameters and testing how they influence the robot’s behaviour. For example, we want to test if changing the variation rates of the biological processes affects user engagement with the robot since a psychological theory states that a high frequency of actions might increase user engagement in social scenarios. Similarly, by changing the importance given to new experiences over past experiences, we can evaluate the robot in personalising the selection of entertainment activities based on user performance and preferences.
- Role of pleasure: Pleasure plays a very important role in dopamine secretion. This study considers that dopamine increases when the robot consumes broccoli or drinks water. However, other types of food, drinks, or entertainment activities can produce different dopamine pleasure levels. Therefore, defining a robot with different preferences (for instance, preferring chocolate over broccoli) can be of interest to motivate users to discover these patterns and engage them with the robot.
- Testing in more complex scenarios: The scenarios we propose aim to evaluate the model’s performance and analyse its possibilities. However, further tests involving more biological processes such as social needs (talking, affect, or physical interaction) with more stimuli (caresses or different user responses) might produce a more comprehensive and meaningful robot behaviour that can operate in more challenging environments.
- Different robotic applications: This paper presents the application of the dopamine model to social robots interacting with people in an entertainment scenario. However, we want to explore the model’s performance in other scenarios like mobile robotics, where a motivational system based on dopamine, as proposed in this contribution, can promote environmental exploration, yielding interesting situations. For example, the robot’s exploration can be regulated by curiosity, a psychological factor highly affected by dopamine [7].
- Exploring other chemicals: This study considers the role of dopamine on motivated behaviour. However, other brain substances like serotonin or oxytocin influence social behaviour. Consequently, we propose the inclusion of these new chemicals to regulate Mini’s social behaviour, depending on the user responses and behaviour, and evaluate how people perceive these changes. Moreover, investigating the interactions among neurotransmitters and their impact on dopamine secretion is essential for improved model implementation.
7. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Data Availability Statement
Conflicts of Interest
References
- Lăzăroiu, G.; Pera, A.; Ștefănescu Mihăilă, R.O.; Mircică, N.; Negurită, O. Can neuroscience assist us in constructing better patterns of economic decision-making? Front. Behav. Neurosci. 2017, 11, 188. [Google Scholar] [CrossRef]
- Banning, M. A review of clinical decision making: Models and current research. J. Clin. Nurs. 2008, 17, 187–195. [Google Scholar] [CrossRef]
- Maroto-Gómez, M.; Alonso-Martín, F.; Malfaz, M.; Castro-González, Á.; Castillo, J.C.; Salichs, M.Á. A systematic literature review of decision-making and control systems for autonomous and social robots. Int. J. Soc. Robot. 2023, 15, 745–789. [Google Scholar] [CrossRef]
- Bekey, G.A. Autonomous Robots: From Biological Inspiration to Implementation and Control; MIT Press: Cambridge, MA, USA, 2005. [Google Scholar]
- Liu, S.; Wang, L.; Gao, R.X. Cognitive neuroscience and robotics: Advancements and future research directions. Robot. Comput.-Integr. Manuf. 2024, 85, 102610. [Google Scholar] [CrossRef]
- Bogacz, R. Dopamine role in learning and action inference. Elife 2020, 9, e53262. [Google Scholar] [CrossRef] [PubMed]
- Seitz, B.M.; Hoang, I.B.; DiFazio, L.E.; Blaisdell, A.P.; Sharpe, M.J. Dopamine errors drive excitatory and inhibitory components of backward conditioning in an outcome-specific manner. Curr. Biol. 2022, 32, 3210–3218. [Google Scholar] [CrossRef] [PubMed]
- Cañamero, L.; Lewis, M. Making new “New AI” friends: Designing a social robot for diabetic children from an embodied AI perspective. Int. J. Soc. Robot. 2016, 8, 523–537. [Google Scholar] [CrossRef]
- Sutton, R.S.; Barto, A.G. Reinforcement Learning: An Introduction; MIT Press: Cambridge, MA, USA, 2018. [Google Scholar]
- Molnár, Z.; Clowry, G.J.; Šestan, N.; Alzu’bi, A.; Bakken, T.; Hevner, R.F.; Hüppi, P.S.; Kostović, I.; Rakic, P.; Anton, E.; et al. New insights into the development of the human cerebral cortex. J. Anat. 2019, 235, 432–451. [Google Scholar] [CrossRef]
- Burton, A.C.; Nakamura, K.; Roesch, M.R. From ventral-medial to dorsal-lateral striatum: Neural correlates of reward-guided decision-making. Neurobiol. Learn. Mem. 2015, 117, 51–59. [Google Scholar] [CrossRef]
- Le Merrer, J.; Becker, J.A.; Befort, K.; Kieffer, B.L. Reward processing by the opioid system in the brain. Physiol. Rev. 2009, 89, 1379–1412. [Google Scholar] [CrossRef]
- Salichs, M.A.; Castro-González, Á.; Salichs, E.; Fernández-Rodicio, E.; Maroto-Gómez, M.; Gamboa-Montero, J.J.; Marques-Villarroya, S.; Castillo, J.C.; Alonso-Martín, F.; Malfaz, M. Mini: A new social robot for the elderly. Int. J. Soc. Robot. 2020, 12, 1231–1249. [Google Scholar] [CrossRef]
- Yeragani, V.K.; Tancer, M.; Chokka, P.; Baker, G.B. Arvid Carlsson, and the story of dopamine. Indian J. Psychiatry 2010, 52, 87. [Google Scholar] [CrossRef] [PubMed]
- Crow, T. The relation between electrical self-stimulation sites and catecholamine-containing neurones in the rat mesencephalon. Experientia 1971, 27, 662. [Google Scholar] [CrossRef] [PubMed]
- Fouriezos, G.; Hansson, P.; Wise, R.A. Neuroleptic-induced attenuation of brain stimulation reward in rats. J. Comp. Physiol. Psychol. 1978, 92, 661. [Google Scholar] [CrossRef]
- Wise, R.A. Neuroleptics and operant behavior: The anhedonia hypothesis. Behav. Brain Sci. 1982, 5, 39–53. [Google Scholar] [CrossRef]
- Simansky, K.J.; Bourbonais, K.A.; Smith, G.P. Food-related stimuli increase the ratio of 3,4-dihydroxyphenylacetic acid to dopamine in the hypothalamus. Pharmacol. Biochem. Behav. 1985, 23, 253–258. [Google Scholar] [CrossRef] [PubMed]
- Blackburn, J.R.; Phillips, A.G.; Jakubovic, A.; Fibiger, H.C. Dopamine and preparatory behavior: II. A neurochemical analysis. Behav. Neurosci. 1989, 103, 15. [Google Scholar] [CrossRef] [PubMed]
- Schultz, W.; Dayan, P.; Montague, P.R. A neural substrate of prediction and reward. Science 1997, 275, 1593–1599. [Google Scholar] [CrossRef]
- Hayashi, E.; Yamasaki, T.; Kuroki, K. Autonomous behavior system combing motivation with consciousness using dopamine. In Proceedings of the 2009 IEEE International Symposium on Computational Intelligence in Robotics and Automation-(CIRA), Daejeon, Republic of Korea, 15–18 December 2009; IEEE: Piscataway, NJ, USA, 2009; pp. 126–131. [Google Scholar]
- Baldassarre, G.; Mannella, F.; Fiore, V.G.; Redgrave, P.; Gurney, K.; Mirolli, M. Intrinsically motivated action–outcome learning and goal-based action recall: A system-level bio-constrained computational model. Neural Netw. 2013, 41, 168–187. [Google Scholar] [CrossRef]
- Fiore, V.G.; Sperati, V.; Mannella, F.; Mirolli, M.; Gurney, K.; Friston, K.; Dolan, R.J.; Baldassarre, G. Keep focussing: Striatal dopamine multiple functions resolved in a single mechanism tested in a simulated humanoid robot. Front. Psychol. 2014, 5, 124. [Google Scholar] [CrossRef]
- Krichmar, J.L. A neurorobotic platform to test the influence of neuromodulatory signaling on anxious and curious behavior. Front. Neurorobot. 2013, 7, 1. [Google Scholar] [CrossRef]
- Friston, K. The free-energy principle: A unified brain theory? Nat. Rev. Neurosci. 2010, 11, 127–138. [Google Scholar] [CrossRef] [PubMed]
- Miller, K.J.; Shenhav, A.; Ludvig, E.A. Habits without values. Psychol. Rev. 2019, 126, 292. [Google Scholar] [CrossRef] [PubMed]
- Canamero, D. Modeling motivations and emotions as a basis for intelligent behavior. In Proceedings of the First International Conference on Autonomous Agents, Marina del Rey, CA, USA, 5–8 February 1997; pp. 148–155. [Google Scholar]
- Gadanho, S.C. Learning behavior-selection by emotions and cognition in a multi-goal robot task. J. Mach. Learn. Res. 2003, 4, 385–412. [Google Scholar]
- Malfaz, M.; Salichs, M.A. Using emotions for behaviour-selection learning. Front. Artif. Intell. Appl. 2006, 141, 697. [Google Scholar]
- Lisetti, C.L.; Brown, S.M.; Alvarez, K.; Marpaung, A.H. A social informatics approach to human-robot interaction with a service social robot. IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev.) 2004, 34, 195–209. [Google Scholar] [CrossRef]
- Ushida, H. Effect of social robot’s behavior in collaborative learning. In Proceedings of the 2010 5th ACM/IEEE International Conference on Human-Robot Interaction (HRI), Osaka, Japan, 2–5 March 2010; IEEE: Piscataway, NJ, USA, 2010; pp. 195–196. [Google Scholar]
- Lewis, M.; Cañamero, L. A robot model of stress-induced compulsive behavior. In Proceedings of the 2019 8th International Conference on Affective Computing and Intelligent Interaction (ACII), Cambridge, UK, 3–6 September 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 559–565. [Google Scholar]
- Hiolle, A.; Lewis, M.; Cañamero, L. Arousal regulation and affective adaptation to human responsiveness by a robot that explores and learns a novel environment. Front. Neurorobot. 2014, 8, 17. [Google Scholar] [CrossRef]
- Lones, J.; Lewis, M.; Cañamero, L. Hormonal modulation of development and behaviour permits a robot to adapt to novel interactions. In Proceedings of the Artificial Life Conference Proceedings: ALIFE 14: The Fourteenth International Conference on the Synthesis and Simulation of Living Systems, 30 July– 2August 2014; MIT Press: Cambridge, MA, USA, 2014; pp. 184–191. [Google Scholar]
- O’Brien, M.J.; Arkin, R.C. Adapting to environmental dynamics with an artificial circadian system. Adapt. Behav. 2020, 28, 165–179. [Google Scholar] [CrossRef]
- Egido-García, V.; Estévez, D.; Corrales-Paredes, A.; Terrón-López, M.J.; Velasco-Quintana, P.J. Integration of a social robot in a pedagogical and logopedic intervention with children: A case study. Sensors 2020, 20, 6483. [Google Scholar] [CrossRef]
- Hong, A.; Lunscher, N.; Hu, T.; Tsuboi, Y.; Zhang, X.; dos Reis Alves, S.F.; Nejat, G.; Benhabib, B. A multimodal emotional human–robot interaction architecture for social robots engaged in bidirectional communication. IEEE Trans. Cybern. 2020, 51, 5954–5968. [Google Scholar] [CrossRef]
- Maroto-Gómez, M.; Castro-González, Á.; Malfaz, M.; Salichs, M.Á. A biologically inspired decision-making system for the autonomous adaptive behavior of social robots. Complex Intell. Syst. 2023, 9, 6661–6679. [Google Scholar] [CrossRef] [PubMed]
- Davies, K.J.A. Adaptive homeostasis. Mol. Asp. Med. 2016, 49, 1–7. [Google Scholar] [CrossRef] [PubMed]
- Smith, K.S.; Tindell, A.J.; Aldridge, J.W.; Berridge, K.C. Ventral pallidum roles in reward and motivation. Behav. Brain Res. 2009, 196, 155–167. [Google Scholar] [CrossRef] [PubMed]
- Stelly, C.E.; Girven, K.S.; Lefner, M.J.; Fonzi, K.M.; Wanat, M.J. Dopamine release and its control over early Pavlovian learning differs between the NAc core and medial NAc shell. Neuropsychopharmacology 2021, 46, 1780–1787. [Google Scholar] [CrossRef] [PubMed]
- Quigley, M.; Conley, K.; Gerkey, B.; Faust, J.; Foote, T.; Leibs, J.; Wheeler, R.; Ng, A. ROS: An open-source Robot Operating System. ICRA Workshop Open Source Softw. 2009, 3, 5. [Google Scholar]
Reference | Pros | Challenges to Be Addressed | Improvements by Our Approach |
---|---|---|---|
Cañamero et al. [27] | Pioneered emotion-driven decision-making, foundational for animal- and human-inspired robotic processes. | Limited scalability to more complex robotic systems. | A more complex dopamine model improves anticipation and learning, allowing better decision-making in dynamic environments. |
Gadanho [28] | Introduced reinforcement learning and cognitive elements, enabling agent evolution and adaptation. | Integration of emotion and cognition may introduce computational complexity. The learning method limits the addition of new processes. | A bio-inspired model facilitates the introduction of new behaviours by providing a modular architecture. |
Lisetti et al. [30] | Enhanced social skills through emotional expression, improving human–robot interaction. | Limited expressiveness and understanding of diverse human emotions. | Decision-making is based on biological processes and dopamine dynamics that consider the effect of stimuli on anticipating future rewards. |
Hiolle et al. [33] | Established an arousal-based model for decision-making, contributing to adaptive robotic behaviour. | Challenges in precisely mapping emotional states to arousal levels. | The biological model enables a more diverse behaviour control based on dynamic processes that evolve with time. |
O’Brien and Arkin [35] | Modelled circadian rhythms, aligning robot behaviour with daily human needs. | Implementation challenges in accurately capturing and mimicking circadian patterns. | Including a dopamine model with learning improves the adaptation to dynamic environments and reacting to dynamic stimuli. |
Hong et al. [37] | Developed a multimodal decision-making model, improving engagement through user-emotional state estimation. | Ensuring accurate estimation of user emotions and effective actuation based on these estimations. | The bio-inspired dopamine model allows the robot to make decisions based on the robot’s internal processes and ambient stimuli, not only considering user emotions. |
Maroto-Gómez et al. [38] | Established a biologically-influenced decision-making system, continuously updated based on stimuli. | Challenges in precisely mimicking and adapting to biological factors in decision-making. | Including the dopamine model with learning allows the robot to anticipate future rewards and adapt to dynamic environments. |
Term | Definition |
---|---|
Adaptation | The process by which an organism or system adjusts to changes in its environment or conditions to maintain functionality. |
Anticipation | The ability to predict or expect a future event or outcome, often influencing behaviour or decision-making. |
Biological process | A series of events or actions that occur in living organisms to maintain life, such as digestion, growth, or cellular repair. |
The cerebral cortex | The outer layer of the brain that is responsible for complex functions like thought, perception, memory, and decision-making. |
Deficit | A lack or shortage of something, often referring to a shortfall in a specific function, resource, or ability. |
Dopamine | A neurotransmitter in the brain that plays a key role in motivation, reward, and learning processes. |
Eligibility traces | A method in reinforcement learning that helps associate past actions with future rewards, improving learning efficiency. |
Expected reward | The predicted value or benefit that an individual anticipates receiving due to a specific action or decision. |
Gain matrix | In control theory, a matrix that is used to adjust the strength of control signals, optimising system performance. |
Homeostasis | The process by which living organisms regulate internal conditions, such as temperature or pH, to maintain a stable, healthy state. |
Motivation | The drive or desire to achieve a goal is often influenced by rewards, needs, or expectations. |
Reinforcement learning | A type of machine learning where an agent learns to make decisions by receiving rewards or punishments for its actions. |
Reward | A positive outcome or benefit from performing a particular action often reinforces that behaviour. |
Striatum | A brain region is involved in planning and executing movements, processing rewards, and forming habits. |
Ventral tegmental area | A part of the brain that plays a crucial role in the reward system, releasing dopamine in response to rewarding stimuli. |
Well-being | A state of being comfortable, healthy, and happy, often encompassing both physical and mental health. |
Internal Process | Deficit | Variation Rate | Range | Stimulus | Action | Action Effect |
---|---|---|---|---|---|---|
Sleep | Fatigue | +0.2 | 0 to 100 | Lights off | Sleep | −8 fatigue deficit |
Feeding | Hunger | −0.5 | 100 to 0 | Broccoli | Eat | −10 hunger deficit |
Moisturising | Thirst | −0.5 | 100 to 0 | Water | Drink | −10 thirst deficit |
Entertainment | Boredom | −0.3 | 100 to 0 | User | Play | −6 on boredom deficit |
Variable | Value | Role |
---|---|---|
1 | Sampling period | |
20 | Duration of each microstate | |
0.03 | Learning factor | |
0.75 | Eligibility trace retention | |
2 | Time constant of dopamine secretion | |
20 | Time constant of reward | |
s | 0.1 | Pleasure scale factor |
k | 0.05 | Pleasure decay factor |
0.995 | Discount factor to reduce reward dispersion in time |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Maroto-Gómez, M.; Burguete-Alventosa, J.; Álvarez-Arias, S.; Malfaz, M.; Salichs, M.Á. A Bio-Inspired Dopamine Model for Robots with Autonomous Decision-Making. Biomimetics 2024, 9, 504. https://doi.org/10.3390/biomimetics9080504
Maroto-Gómez M, Burguete-Alventosa J, Álvarez-Arias S, Malfaz M, Salichs MÁ. A Bio-Inspired Dopamine Model for Robots with Autonomous Decision-Making. Biomimetics. 2024; 9(8):504. https://doi.org/10.3390/biomimetics9080504
Chicago/Turabian StyleMaroto-Gómez, Marcos, Javier Burguete-Alventosa, Sofía Álvarez-Arias, María Malfaz, and Miguel Ángel Salichs. 2024. "A Bio-Inspired Dopamine Model for Robots with Autonomous Decision-Making" Biomimetics 9, no. 8: 504. https://doi.org/10.3390/biomimetics9080504
APA StyleMaroto-Gómez, M., Burguete-Alventosa, J., Álvarez-Arias, S., Malfaz, M., & Salichs, M. Á. (2024). A Bio-Inspired Dopamine Model for Robots with Autonomous Decision-Making. Biomimetics, 9(8), 504. https://doi.org/10.3390/biomimetics9080504