ChatPaper.aiChatPaper

GLEE:一种统一的框架和基准,用于基于语言的经济环境。

GLEE: A Unified Framework and Benchmark for Language-based Economic Environments

October 7, 2024
作者: Eilam Shapira, Omer Madmon, Itamar Reinman, Samuel Joseph Amouyal, Roi Reichart, Moshe Tennenholtz
cs.AI

摘要

大型语言模型(LLMs)在经济和战略互动中展现出显著潜力,其中通过自然语言进行沟通往往很普遍。这引发了一些关键问题:LLMs是否表现理性?它们能够模仿人类行为吗?它们是否倾向于达到有效和公平的结果?自然语言在战略互动中的作用是什么?经济环境的特征如何影响这些动态?这些问题在将基于LLM的代理程序整合到现实世界的数据驱动系统(如在线零售平台和推荐系统)的经济和社会影响方面变得至关重要。虽然机器学习社区一直在探索LLMs在这种多代理设置中的潜力,但研究中不同的假设、设计选择和评估标准使得很难得出稳健且有意义的结论。为了解决这个问题,我们引入了一个基准,用于规范研究双人、顺序、基于语言的游戏。受经济文献启发,我们定义了三类具有一致参数化、自由度和经济度量的基础游戏家族,以评估代理程序的性能(自身收益)以及游戏结果(效率和公平性)。我们开发了一个开源框架用于交互模拟和分析,并利用它收集了一个LLM与LLM在多种游戏配置下的交互数据集,以及一个人类与LLM交互的额外数据集。通过广泛实验,我们展示了我们的框架和数据集如何用于:(i)比较LLM代理程序在不同经济背景下与人类玩家的行为;(ii)评估代理程序在个体和集体绩效指标上的表现;以及(iii)量化环境的经济特征对代理程序行为的影响。
English
Large Language Models (LLMs) show significant potential in economic and strategic interactions, where communication via natural language is often prevalent. This raises key questions: Do LLMs behave rationally? Can they mimic human behavior? Do they tend to reach an efficient and fair outcome? What is the role of natural language in the strategic interaction? How do characteristics of the economic environment influence these dynamics? These questions become crucial concerning the economic and societal implications of integrating LLM-based agents into real-world data-driven systems, such as online retail platforms and recommender systems. While the ML community has been exploring the potential of LLMs in such multi-agent setups, varying assumptions, design choices and evaluation criteria across studies make it difficult to draw robust and meaningful conclusions. To address this, we introduce a benchmark for standardizing research on two-player, sequential, language-based games. Inspired by the economic literature, we define three base families of games with consistent parameterization, degrees of freedom and economic measures to evaluate agents' performance (self-gain), as well as the game outcome (efficiency and fairness). We develop an open-source framework for interaction simulation and analysis, and utilize it to collect a dataset of LLM vs. LLM interactions across numerous game configurations and an additional dataset of human vs. LLM interactions. Through extensive experimentation, we demonstrate how our framework and dataset can be used to: (i) compare the behavior of LLM-based agents to human players in various economic contexts; (ii) evaluate agents in both individual and collective performance measures; and (iii) quantify the effect of the economic characteristics of the environments on the behavior of agents.

Summary

AI-Generated Summary

PDF852November 16, 2024