SRMT: 다중 에이전트 평생 경로 탐색을 위한 공유 메모리

초록

다중 에이전트 강화 학습(MARL)은 다양한 환경에서 협력적 및 경쟁적 다중 에이전트 문제를 해결하는 데 상당한 진전을 보여줍니다. MARL에서 주요한 과제 중 하나는 협력을 달성하기 위해 에이전트의 행동을 명시적으로 예측해야 한다는 점입니다. 이 문제를 해결하기 위해 우리는 Shared Recurrent Memory Transformer (SRMT)를 제안합니다. SRMT는 메모리 트랜스포머를 확장하여 개별 작업 메모리를 풀링하고 전역적으로 방송하여 에이전트가 정보를 암묵적으로 교환하고 행동을 조정할 수 있게 합니다. 우리는 SRMT를 부분 관측 다중 에이전트 경로 찾기 문제와 좁은 병목 현상을 통과해야 하는 장난감 병목 내비게이션 작업, 그리고 POGEMA 벤치마크 작업 세트에서 평가합니다. 병목 작업에서 SRMT는 일관되게 다양한 강화 학습 베이스라인을 능가하며 특히 희소 보상 하에서 효과적으로 일반화되어 훈련 중 본 적이 없는 보다 긴 복도에 대해 효과적으로 일반화됩니다. 미로, 무작위, MovingAI를 포함한 POGEMA 맵에서 SRMT는 최근 MARL, 혼합 및 계획 기반 알고리즘과 경쟁력을 갖습니다. 이러한 결과는 공유 재귀 메모리를 트랜스포머 기반 아키텍처에 통합함으로써 분산된 다중 에이전트 시스템에서 조정을 향상시킬 수 있다는 것을 시사합니다. 훈련 및 평가용 소스 코드는 GitHub에서 확인할 수 있습니다: https://github.com/Aloriosa/srmt.

English

Multi-agent reinforcement learning (MARL) demonstrates significant progress in solving cooperative and competitive multi-agent problems in various environments. One of the principal challenges in MARL is the need for explicit prediction of the agents' behavior to achieve cooperation. To resolve this issue, we propose the Shared Recurrent Memory Transformer (SRMT) which extends memory transformers to multi-agent settings by pooling and globally broadcasting individual working memories, enabling agents to exchange information implicitly and coordinate their actions. We evaluate SRMT on the Partially Observable Multi-Agent Pathfinding problem in a toy Bottleneck navigation task that requires agents to pass through a narrow corridor and on a POGEMA benchmark set of tasks. In the Bottleneck task, SRMT consistently outperforms a variety of reinforcement learning baselines, especially under sparse rewards, and generalizes effectively to longer corridors than those seen during training. On POGEMA maps, including Mazes, Random, and MovingAI, SRMT is competitive with recent MARL, hybrid, and planning-based algorithms. These results suggest that incorporating shared recurrent memory into the transformer-based architectures can enhance coordination in decentralized multi-agent systems. The source code for training and evaluation is available on GitHub: https://github.com/Aloriosa/srmt.

SRMT: 다중 에이전트 평생 경로 탐색을 위한 공유 메모리

SRMT: Shared Memory for Multi-agent Lifelong Pathfinding

초록

Support