文本競技場
TextArena
April 15, 2025
作者: Leon Guertler, Bobby Cheng, Simon Yu, Bo Liu, Leshem Choshen, Cheston Tan
cs.AI
摘要
TextArena 是一個開源的競技性文字遊戲集合,專為大型語言模型(LLMs)的代理行為訓練與評估而設計。它涵蓋了超過 57 種獨特的環境(包括單人、雙人及多人模式),並透過線上遊玩系統(可與人類及其他提交的模型對戰)即時計算 TrueSkill 分數,從而簡化模型能力的評估。傳統基準測試鮮少評估動態社交技能,如談判、心智理論與欺騙,而 TextArena 正填補了這一空白。TextArena 以研究、社群參與及可擴展性為核心設計理念,強調新增遊戲、調整框架、測試模型、與模型對戰及訓練模型的便捷性。詳細的環境、遊戲、排行榜及範例文件可於 https://github.com/LeonGuertler/TextArena 與 https://www.textarena.ai/ 查閱。
English
TextArena is an open-source collection of competitive text-based games for
training and evaluation of agentic behavior in Large Language Models (LLMs). It
spans 57+ unique environments (including single-player, two-player, and
multi-player setups) and allows for easy evaluation of model capabilities via
an online-play system (against humans and other submitted models) with
real-time TrueSkill scores. Traditional benchmarks rarely assess dynamic social
skills such as negotiation, theory of mind, and deception, creating a gap that
TextArena addresses. Designed with research, community and extensibility in
mind, TextArena emphasizes ease of adding new games, adapting the framework,
testing models, playing against the models, and training models. Detailed
documentation of environments, games, leaderboard, and examples are available
on https://github.com/LeonGuertler/TextArena and https://www.textarena.ai/.Summary
AI-Generated Summary