A Unified Game-Theoretic Approach to Multi-agent Reinforcement Learning

阅读量：

In this session, we will conduct a detailed exploration of a foundational paper titled A Unified Game-Theoretic Approach to Multi-agent Reinforcement Learning, which has played a pivotal role in shaping the development of AlphaStar. While AlphaStar encompasses a variety of intricate concepts, this session will focus specifically on delving into key principles rooted in the Nash equilibrium framework’s conceptual foundation and examining how game theory integrates with reinforcement learning techniques.

At the conclusion of this article, you should acquire an understanding of the Double Oracle algorithm, the Deep Cognitive Hierarchies, and the Policy-Space Response Oracles. You should also understand these concepts.

To gain an understanding of this post, you should be acquainted with certain fundamental concepts in game theory, such as the definition of a strategic game structured as a payoff matrix, the concept of Nash Equilibria, and the idea of best responses. Additionally, you are encouraged to explore various conceptual implementations provided here, including a Python implementation using numpy that offers further insights into these topics.

Why a Multi-agent context ?

In #AlphaStar, a multi-agent setup was developed to enhance outcomes related to strategic decision-making through self-playing. This implies that the strategy has evolved from a population-based multi-agent system, evolving through interaction with other agents via reinforcement learning. MARL relies on iterative improvements derived from analyzing approximate best responses generated by mixtures of policies employing deep reinforcement learning techniques.

To delve into multi-agent RL, it is also important to note the concept of a normal-form game within game theory: a tuple (π, U, n) is defined for n representing the number of players, U representing payoff utility functions and associated policies for each player π.

Double Oracle Algorithm

Among the various approaches, a representative algorithm and its generalization embody the fundamental foundations in analyzing the reinforcement learning mechanisms within multi-agent systems.

该算法通过迭代方式逐步计算子博弈的收益矩阵 $...$ 该算法通过深度神经网络实现子博弈G_t的收益矩阵表示。具体而言，在每次时间步长 $t$ 处，系统会计算出一个均衡响应 $\sigma$ 以及一个最佳策略 $\pi$ （即纳什均衡），这两者将共同决定G_{t+1}的状态转移关系。为了在收益矩阵中进行函数近似，我们计划使用深度神经网络。

Visual Explanation of the DO Algorithm. The authors of the DO algorithm are Bosansky, Lisky, Cermak, Vitek, and Pechoucek. An important consideration is that an approximate best response is computed rather than the exact one; this approach ensures computational feasibility while yielding satisfactory outcomes.

To address both aspects of generalization and scalability, we refer to these two variants as Policy-Space Response Oracles (PSRO) and Deep Cognitive Hierarchies (DCH).

Policy-Space Response Oracles (PSRO)

The algorithm represents an extension to Double Oracle, where in each iteration, decisions are made based on selecting from available strategies rather than individual actions. In each case, we first select an initial policy, compute to determine its utility function, and then establish the meta-strategy as a function dependent on the distribution of these selected policies.

The meta-game begins with a single policy and grows incrementally through epochs. It incorporates sub-routines, known as oracles, which simulate responses to the strategies employed by other players.

However, this algorithm faces significant scalability challenges. Specifically, within every epoch, for every player and episode, a computation occurs to determine both the new policy π and the meta-strategy based on each individual player's data. Instead of striving to calculate the exact optimal response, an approximate optimal strategy employing reinforcement learning (RL) is utilized.

该正规形式游戏可由（政策、效用函数及参与玩家数量）三者构成。当输入为各玩家的策略时（INPUT），输出则为每个玩家的解决方案策略（OUTPUT）。
从整体来看，在强化学习框架下，算法通过从多玩家中抽取多个策略，并运用Double Oracle算法进行优化。最终输出结果构成了基于不同策略分布的综合评价。

Deep Cognitive Hierarchies (DCH)

DCH近似于PSRO。为了提高可扩展性而牺牲了准确性。强化学习步骤需要很长时间才能收敛到一个满意的结果。因此，在一般化方面是否需要一种并行形式的PSRO？即在预先选择k个层级的基础上：每个层级训练一个单一的元政策并同时运行多个过程，并在磁盘上定期保存这些过程。除了上述提到的强化学习步骤外

To prevent overtraining during the learning phase, certain joint policy correlation matrices are computed using distinct seed values to initialize the random number generators.

Conclusions and further readings :

Thanks for reaching this point! By now, you should be acquainted with the Double Oracle Algorithm, the policy-Space Response Oracles, and Deep Cognitive Hierarchies. Some concepts like meta-solvers were excluded from the article not because they were deemed unnecessary, but as they reduce algorithm big O.

Additional readings within the framework of a Generalized Method for Empirical Game Theoretic Analysis and asymmetric multi-agent reinforcement learning models could potentially contribute new insights into the identification of asymmetric Nash equilibrium solutions within pure strategy spaces.

全部评论 (0)

还没有任何评论哟~

A Unified Game-Theoretic Approach to Multi-agent Reinforcement Learning

TodaywewilldigintoapaperrippedofAUnifiedGameTheoreticApproachtoMultiagentReinforcementLearning,oneof...

[NIPS2017] A Unified Game-Theoretic Approach to Multiagent Reinforcement Learning 笔记

文章目录前言 BackgroundandRelatedWork NeuralFictitiousSelfPlay PolicySpaceResponseOracles MetaStrategySol...

LEARNING TO SCHEDULE COMMUNICATION IN MULTI-AGENT REINFORCEMENT LEARNING

ABSTRACT Manyrealworldreinforcementlearningtasksrequiremultipleagentstomakesequentialdecisionsundert...

Learning to Communicate with Deep Multi-Agent Reinforcement Learning

Abstract Weconsidertheproblemofmultipleagentssensingandactinginenvironmentswiththegoalofmaximisingth...

Learning to Communicate with Deep Multi-Agent Reinforcement Learning笔记

1\.论文讲了什么/主要贡献是什么文章提出了通过深度学习的方法，对代理间的通信协议进行学习的思想。从而通过代理之间的通信解决多代理强化学习问题。 2\.论文摘要： Weconsidertheprob...

Hierarchical Reinforcement Learning for Multi-agent MOBA Game 论文笔记

题目：HierarchicalReinforcementLearningforMultiagentMOBAGame 翻译&重点提炼 Abstact概述实时策略（RTS）游戏需要宏观策略和微观策略才能...

Multi-Agent Reinforcement Learning (MARL)

在多个智能体同时操作的环境中学习最优策略。对于模拟多种网络攻击和防御策略的交互特别有用。 MultiAgentReinforcementLearningMARL是强化学习的一个分支，它专注于在包含多个...

A Survey on Multi-Agent Reinforcement Learning Methods for Vehicular Networks

摘要在物联网（IoT）的飞速发展下，车辆可以被视为移动的智能体，它们可以进行通信，合作以及竞争资源和信息。车辆需要学习策略并做出决策，以提高多智能体系统（MAS）应对不断变化的环境的能力。多智能体强...

AAAI‘23 AdaTask: A Task-aware Adaptive Learning Rate Approach to Multi-task Learning

标题：AdaTask:ATaskawareAdaptiveLearningRateApproachtoMultitaskLearning 地址：<https://arxiv.org/pdf/2211....

About communication in Multi-Agent Reinforcement Learning

CommunicationisoneofthecomponentsofMARLandanactiveareaofresearchitself,asitmightinfluencethefinalper...

是否确定退出登录?

A Unified Game-Theoretic Approach to Multi-agent Reinforcement Learning

Why a Multi-agent context ?

Double Oracle Algorithm

Policy-Space Response Oracles (PSRO)

Deep Cognitive Hierarchies (DCH)

Conclusions and further readings :

全部评论 (0)

相关文章推荐

A Unified Game-Theoretic Approach to Multi-agent Reinforcement Learning

[NIPS2017] A Unified Game-Theoretic Approach to Multiagent Reinforcement Learning 笔记

LEARNING TO SCHEDULE COMMUNICATION IN MULTI-AGENT REINFORCEMENT LEARNING

Learning to Communicate with Deep Multi-Agent Reinforcement Learning

Learning to Communicate with Deep Multi-Agent Reinforcement Learning笔记

Hierarchical Reinforcement Learning for Multi-agent MOBA Game 论文笔记

Multi-Agent Reinforcement Learning (MARL)

A Survey on Multi-Agent Reinforcement Learning Methods for Vehicular Networks

AAAI‘23 AdaTask: A Task-aware Adaptive Learning Rate Approach to Multi-task Learning

About communication in Multi-Agent Reinforcement Learning