*2.3. Multi-Agent Deep Reinforcement Learning*

If we ignore the the kinematics and only consider the behavior strategies of the group of robots, there have been plenty of novel works called multi-agent reinforcement learning (MARL) in recent years. Raileanu et al. [37] proposed a new approach for learning in self other-modeling. This method used its own policy to estimate the other agent's actions and updated its belief of the hidden state. Then, the estimations were used to choose new actions. Yang et al. [38] simplified the communication of agents into an average effect. By introducing the mean-field theory, they mainly studied one agent with the average effect of the others. Wang et al. [39] estimated an opponent's future behavior by utilizing the history information. Tacchetti et al. [40] proposed the relational forward models to address the MARL tasks. They added the relational graph model in the action making stage, and used the recurrent neural network (RNN) to model the relationships between agents. However, with the ideal assumptions, there is still a lot of work to do if we want to apply these methods to the multi-robot navigation tasks.
