SRMT：多智能体终身路径规划的共享内存

摘要

多智体强化学习（MARL）在解决合作和竞争多智体问题方面取得了显著进展。MARL面临的主要挑战之一是需要明确预测智体行为以实现合作。为解决这一问题，我们提出了共享循环记忆变压器（SRMT），它通过汇集和全局广播个体工作记忆，将记忆变压器扩展到多智体环境，使智体能够隐式交换信息并协调其行动。我们在一个需要智体通过狭窄走廊的玩具瓶颈导航任务中以及在POGEMA基准任务集上评估了SRMT。在瓶颈任务中，SRMT在稀疏奖励下始终优于各种强化学习基线，并能有效泛化到训练中未见过的更长走廊。在包括迷宫、随机和MovingAI在内的POGEMA地图上，SRMT与最近的MARL、混合和基于规划的算法相竞争。这些结果表明，在基于变压器的架构中加入共享循环记忆可以增强去中心化多智体系统中的协调。训练和评估的源代码可在GitHub上找到：https://github.com/Aloriosa/srmt。

English

Multi-agent reinforcement learning (MARL) demonstrates significant progress in solving cooperative and competitive multi-agent problems in various environments. One of the principal challenges in MARL is the need for explicit prediction of the agents' behavior to achieve cooperation. To resolve this issue, we propose the Shared Recurrent Memory Transformer (SRMT) which extends memory transformers to multi-agent settings by pooling and globally broadcasting individual working memories, enabling agents to exchange information implicitly and coordinate their actions. We evaluate SRMT on the Partially Observable Multi-Agent Pathfinding problem in a toy Bottleneck navigation task that requires agents to pass through a narrow corridor and on a POGEMA benchmark set of tasks. In the Bottleneck task, SRMT consistently outperforms a variety of reinforcement learning baselines, especially under sparse rewards, and generalizes effectively to longer corridors than those seen during training. On POGEMA maps, including Mazes, Random, and MovingAI, SRMT is competitive with recent MARL, hybrid, and planning-based algorithms. These results suggest that incorporating shared recurrent memory into the transformer-based architectures can enhance coordination in decentralized multi-agent systems. The source code for training and evaluation is available on GitHub: https://github.com/Aloriosa/srmt.

SRMT：多智能体终身路径规划的共享内存

SRMT: Shared Memory for Multi-agent Lifelong Pathfinding

摘要

Summary

Support