ChatPaper.aiChatPaper

文本竞技场

TextArena

April 15, 2025
作者: Leon Guertler, Bobby Cheng, Simon Yu, Bo Liu, Leshem Choshen, Cheston Tan
cs.AI

摘要

TextArena 是一个开源的文本竞技游戏集合,专为大型语言模型(LLMs)的智能行为训练与评估而设计。它涵盖了57种以上的独特环境(包括单人、双人及多人模式),并通过在线对战系统(支持与人类及其他提交的模型对抗)实时展示TrueSkill评分,从而便捷地评估模型能力。传统基准测试往往忽视了诸如谈判、心智理论及欺骗等动态社交技能的考察,而TextArena正填补了这一空白。以研究、社区参与及可扩展性为核心设计理念,TextArena着重强调新增游戏的简易性、框架的适应性、模型的测试、与模型对弈以及模型训练的便捷性。关于环境、游戏、排行榜及示例的详细文档,请访问 https://github.com/LeonGuertler/TextArena 和 https://www.textarena.ai/。
English
TextArena is an open-source collection of competitive text-based games for training and evaluation of agentic behavior in Large Language Models (LLMs). It spans 57+ unique environments (including single-player, two-player, and multi-player setups) and allows for easy evaluation of model capabilities via an online-play system (against humans and other submitted models) with real-time TrueSkill scores. Traditional benchmarks rarely assess dynamic social skills such as negotiation, theory of mind, and deception, creating a gap that TextArena addresses. Designed with research, community and extensibility in mind, TextArena emphasizes ease of adding new games, adapting the framework, testing models, playing against the models, and training models. Detailed documentation of environments, games, leaderboard, and examples are available on https://github.com/LeonGuertler/TextArena and https://www.textarena.ai/.

Summary

AI-Generated Summary

PDF273April 16, 2025