Multi-agent reinforcement learning with approximate model learning for competitive games
Autoři:
Young Joon Park aff001; Yoon Sang Cho aff001; Seoung Bum Kim aff001
Působiště autorů:
School of Industrial Management Engineering, Korea University, Seoul, Republic of Korea
aff001
Vyšlo v časopise:
PLoS ONE 14(9)
Kategorie:
Research Article
doi:
https://doi.org/10.1371/journal.pone.0222215
Souhrn
We propose a method for learning multi-agent policies to compete against multiple opponents. The method consists of recurrent neural network-based actor-critic networks and deterministic policy gradients that promote cooperation between agents by communication. The learning process does not require access to opponents’ parameters or observations because the agents are trained separately from the opponents. The actor networks enable the agents to communicate using forward and backward paths while the critic network helps to train the actors by delivering them gradient signals based on their contribution to the global reward. Moreover, to address nonstationarity due to the evolving of other agents, we propose approximate model learning using auxiliary prediction networks for modeling the state transitions, reward function, and opponent behavior. In the test phase, we use competitive multi-agent environments to demonstrate by comparison the usefulness and superiority of the proposed method in terms of learning efficiency and goal achievements. The comparison results show that the proposed method outperforms the alternatives.
Zdroje
1. Cao Y., Yu W., Ren W., & Chen G. (2013). An Overview of Recent Progress in the Study of Distributed Multi-Agent Coordination. IEEE Transactions on Industrial Informatics, 9(1), 427–438. https://doi.org/10.1109/TII.2012.2219061
2. Ye D., Zhang M., & Yang Y. (2015). A Multi-Agent Framework for Packet Routing in Wireless Sensor Networks. Sensors (Basel, Switzerland), 15(5), 10026–10047. https://doi.org/10.3390/s150510026
3. Ying W., & Dayong S. (2005). Multi-agent framework for third party logistics in E-commerce. Expert Systems with Applications, 29(2), 431–436. https://doi.org/10.1016/j.eswa.2005.04.039
4. Matarić M. J. (1997). Reinforcement Learning in the Multi-Robot Domain. Autonomous Robots, 4(1), 73–83. https://doi.org/10.1023/A:1008819414322
5. Jaderberg M., Czarnecki W. M., Dunning I., Marris L., Lever G., Castaneda A. G., et al. (2018). Human-level performance in first-person multiplayer games with population-based deep reinforcement learning. Retrieved from https://arxiv.org/abs/1807.01281v1
6. Tampuu A., Matiisen T., Kodelja D., Kuzovkin I., Korjus K., Aru J., et al. (2017). Multiagent cooperation and competition with deep reinforcement learning. PLOS ONE, 12(4), e0172395. doi: 10.1371/journal.pone.0172395 28380078
7. Foerster J., Farquhar G., Afouras T., Nardelli N., & Whiteson S. (2017a). Counterfactual Multi-Agent Policy Gradients. Retrieved from https://arxiv.org/abs/1705.08926v2
8. Lipowska D., & Lipowski A. (2018). Emergence of linguistic conventions in multi-agent reinforcement learning. PLOS ONE, 13(11), e0208095. doi: 10.1371/journal.pone.0208095 30496267
9. Silver D., Schrittwieser J., Simonyan K., Antonoglou I., Huang A., Guez A., et al., (2017). Mastering the game of Go without human knowledge. Nature, 550(7676), 354–359. doi: 10.1038/nature24270 29052630
10. Bansal T., Pachocki J., Sidor S., Sutskever I., & Mordatch I. (2017). Emergent Complexity via Multi-Agent Competition. ArXiv:1710.03748 [Cs]. Retrieved from http://arxiv.org/abs/1710.03748
11. He, H., Boyd-Graber, J., Kwok, K., & Iii, H. D. (2016). Opponent Modeling in Deep Reinforcement Learning. International Conference on Machine Learning, 1804–1813. Retrieved from http://proceedings.mlr.press/v48/he16.html
12. Iqbal S., & Sha F. (2018). Actor-Attention-Critic for Multi-Agent Reinforcement Learning. Retrieved from https://arxiv.org/abs/1810.02912v1
13. Liu M., Xu Y., & Mohammed A.-W. (2016). Decentralized Opportunistic Spectrum Resources Access Model and Algorithm toward Cooperative Ad-Hoc Networks. PLOS ONE, 11(1), e0145526. doi: 10.1371/journal.pone.0145526 26727504
14. Lowe R., Wu Y., Tamar A., Harb J., Abbeel P., & Mordatch I. (2017). Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments. Retrieved from https://arxiv.org/abs/1706.02275v3
15. Rashid T., Samvelyan M., de Witt C. S., Farquhar G., Foerster J., & Whiteson S. (2018). QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning. Retrieved from https://arxiv.org/abs/1803.11485v2
16. Heinrich J., & Silver D. (2016). Deep Reinforcement Learning from Self-Play in Imperfect-Information Games. ArXiv:1603.01121 [Cs]. Retrieved from http://arxiv.org/abs/1603.01121
17. Foerster J., Chen R. Y., Al-Shedivat M., Whiteson S., Abbeel P., & Mordatch I. (2017b). Learning with Opponent-Learning Awareness. ArXiv:1709.04326 [Cs]. Retrieved from http://arxiv.org/abs/1709.04326
18. Harper M., Knight V., Jones M., Koutsovoulos G., Glynatsi N. E., & Campbell O. (2017). Reinforcement learning produces dominant strategies for the Iterated Prisoner’s Dilemma. PLOS ONE, 12(12), e0188046. doi: 10.1371/journal.pone.0188046 29228001
19. Liu S., Lever G., Merel J., Tunyasuvunakool S., Heess N., & Graepel T. (2019). Emergent Coordination Through Competition. ArXiv:1902.07151 [Cs]. Retrieved from http://arxiv.org/abs/1902.07151
20. Zschache J. (2016). Melioration Learning in Two-Person Games. PLOS ONE, 11(11), e0166708. doi: 10.1371/journal.pone.0166708 27851815
21. Hu J., & Wellman M. P. (1998). Multiagent Reinforcement LeAarlgnoinrigt:hmTheoretical Framework and an. 9.
22. Tan, M. (1993). Multi-Agent Reinforcement Learning: Independent vs. Cooperative Agents. In Proceedings of the Tenth International Conference on Machine Learning, 330–337. Morgan Kaufmann.
23. Guestrin C., Koller D., & Parr R. (2002). Multiagent Planning with Factored MDPs. In Dietterich T. G., Becker S., & Ghahramani Z. (Eds.), Advances in Neural Information Processing Systems 14 (pp. 1523–1530). Retrieved from http://papers.nips.cc/paper/1941-multiagent-planning-with-factored-mdps.pdf
24. Talvitie, E. (2017). Self-correcting models for model-based reinforcement learning. Thirty-First AAAI Conference on Artificial Intelligence.
25. Uther W. T. B., & Veloso M. M. (2003). Adaptive Agents in Multi-Agent Systems: Adaptation and Multi-Agent Learning, ser. Lecture Notes in Computer Science. Springer, 2636, 266–296.
26. Ganzfried, S., & Sandholm, T. (2011, May). Game theory-based opponent modeling in large imperfect-information games. In The 10th International Conference on Autonomous Agents and Multiagent Systems-Volume 2 (pp. 533–540). International Foundation for Autonomous Agents and Multiagent Systems.
27. Billings D., Papp D., Schaeffer J., & Szafron D. (1998). Opponent modeling in poker. Aaai/iaai, 493, 499.
28. Richards M., & Amir E. (2007, January). Opponent Modeling in Scrabble. In IJCAI (pp. 1482–1487).
29. Schadd F., Bakkes S., & Spronck P. (2007). Opponent Modeling in Real-Time Strategy Games. In GAMEON (pp. 61–70).
30. Southey F., Bowling M. P., Larson B., Piccione C., Burch N., Billings D., et al. (2012). Bayes' bluff: Opponent modelling in poker. arXiv preprint arXiv:1207.1411.
31. Davidson A., Billings D., Schaeffer J., & Szafron D. (2000). Improved Opponent Modeling in Poker. 493–499. AAAI Press.
32. Lockett, A. J., Chen, C. L., & Miikkulainen, R. (2007). Evolving Explicit Opponent Models in Game Playing. Proceedings of the 9th Annual Conference on Genetic and Evolutionary Computation, 2106–2113. https://doi.org/10.1145/1276958.1277367
33. Amato C., Konidaris G., Kaelbling L. P., & How J. P. (2019). Modeling and Planning with Macro-Actions in Decentralized POMDPs. Journal of Artificial Intelligence Research, 64, 817–859. https://doi.org/10.1613/jair.1.11418
34. Oliehoek F. A., Spaan M. T. J., & Vlassis N. (2008). Optimal and Approximate Q-value Functions for Decentralized POMDPs. Journal of Artificial Intelligence Research, 32, 289–353. https://doi.org/10.1613/jair.2447
35. Vaswani A., Shazeer N., Parmar N., Uszkoreit J., Jones L., Gomez A. N., et al. (2017). Attention is All you Need. In Guyon I, U. V. Luxburg, Bengio S., Wallach H., Fergus R., Vishwanathan S., & Garnett R. (Eds.), Advances in Neural Information Processing Systems 30 (pp. 5998–6008). Retrieved from http://papers.nips.cc/paper/7181-attention-is-all-you-need.pdf
Článek vyšel v časopise
PLOS One
2019 Číslo 9
- Tisícileté topoly, mokří psi, stárnoucí kočky a ospalé octomilky – „jednohubky“ z výzkumu 2024/41
- Jaké jsou aktuální trendy v léčbě karcinomu slinivky?
- Může hubnutí souviset s vyšším rizikem nádorových onemocnění?
- Menstruační krev má značný diagnostický potenciál, mimo jiné u diabetu
- Metamizol jako analgetikum první volby: kdy, pro koho, jak a proč?
Nejčtenější v tomto čísle
- Graviola (Annona muricata) attenuates behavioural alterations and testicular oxidative stress induced by streptozotocin in diabetic rats
- CH(II), a cerebroprotein hydrolysate, exhibits potential neuro-protective effect on Alzheimer’s disease
- Comparison between Aptima Assays (Hologic) and the Allplex STI Essential Assay (Seegene) for the diagnosis of Sexually transmitted infections
- Assessment of glucose-6-phosphate dehydrogenase activity using CareStart G6PD rapid diagnostic test and associated genetic variants in Plasmodium vivax malaria endemic setting in Mauritania
Zvyšte si kvalifikaci online z pohodlí domova
Všechny kurzy