宝可梦大师：专家级极小极大语言智能体

摘要

我们推出了Pok\'eChamp，这是一款由大型语言模型（LLMs）驱动的极小极大智能体，专为Pok\'emon对战设计。基于一个适用于双人竞技游戏的通用框架，Pok\'eChamp充分利用了LLMs的通用能力来增强极小极大树搜索。具体而言，LLMs替代了三个关键模块：（1）玩家动作采样，（2）对手建模，以及（3）价值函数估计，使得智能体能够有效利用游戏历史与人类知识，缩小搜索空间并应对部分可观测性。值得注意的是，我们的框架无需额外的LLM训练。我们在广受欢迎的Gen 9 OU格式下评估了Pok\'eChamp。当搭载GPT-4o时，它以76%的胜率击败了现有最佳的基于LLM的机器人，并以84%的胜率战胜了最强的规则型机器人，展现了其卓越性能。即便使用开源的80亿参数Llama 3.1模型，Pok\'eChamp也持续超越此前由GPT-4o驱动的LLM最佳机器人Pok\'ellmon，取得64%的胜率。在Pok\'emon Showdown在线天梯中，Pok\'eChamp预计达到1300-1500的Elo评分，使其跻身人类玩家前30%-10%之列。此外，本研究汇编了最大的真实玩家Pok\'emon对战数据集，包含超过300万场对局，其中高Elo对局逾50万场。基于此数据集，我们建立了一系列对战基准与谜题，以评估特定对战技能。我们还对本地游戏引擎进行了关键更新。我们期望这项工作能促进更多研究，将Pok\'emon对战作为基准，整合LLM技术与博弈论算法，以解决更广泛的多智能体问题。视频、代码及数据集请访问https://sites.google.com/view/pokechamp-llm。

English

We introduce Pok\'eChamp, a minimax agent powered by Large Language Models (LLMs) for Pok\'emon battles. Built on a general framework for two-player competitive games, Pok\'eChamp leverages the generalist capabilities of LLMs to enhance minimax tree search. Specifically, LLMs replace three key modules: (1) player action sampling, (2) opponent modeling, and (3) value function estimation, enabling the agent to effectively utilize gameplay history and human knowledge to reduce the search space and address partial observability. Notably, our framework requires no additional LLM training. We evaluate Pok\'eChamp in the popular Gen 9 OU format. When powered by GPT-4o, it achieves a win rate of 76% against the best existing LLM-based bot and 84% against the strongest rule-based bot, demonstrating its superior performance. Even with an open-source 8-billion-parameter Llama 3.1 model, Pok\'eChamp consistently outperforms the previous best LLM-based bot, Pok\'ellmon powered by GPT-4o, with a 64% win rate. Pok\'eChamp attains a projected Elo of 1300-1500 on the Pok\'emon Showdown online ladder, placing it among the top 30%-10% of human players. In addition, this work compiles the largest real-player Pok\'emon battle dataset, featuring over 3 million games, including more than 500k high-Elo matches. Based on this dataset, we establish a series of battle benchmarks and puzzles to evaluate specific battling skills. We further provide key updates to the local game engine. We hope this work fosters further research that leverage Pok\'emon battle as benchmark to integrate LLM technologies with game-theoretic algorithms addressing general multiagent problems. Videos, code, and dataset available at https://sites.google.com/view/pokechamp-llm.

宝可梦大师：专家级极小极大语言智能体

PokéChamp: an Expert-level Minimax Language Agent

摘要

Summary

Support

Support