ChatPaper.aiChatPaper

政治科学视角下的LLM基准测试:以联合国为视角

Benchmarking LLMs for Political Science: A United Nations Perspective

February 19, 2025
作者: Yueqing Liang, Liangwei Yang, Chen Wang, Congying Xia, Rui Meng, Xiongxiao Xu, Haoran Wang, Ali Payani, Kai Shu
cs.AI

摘要

大型语言模型(LLMs)在自然语言处理领域取得了显著进展,然而其在高风险政治决策中的潜力仍未被充分探索。本文通过聚焦LLMs在联合国(UN)决策过程中的应用来填补这一空白,其中决策风险尤为重大,且政治决策可能产生深远影响。我们引入了一个新颖的数据集,包含1994年至2024年间公开的联合国安理会(UNSC)记录,涵盖决议草案、投票记录及外交演讲。利用这一数据集,我们提出了联合国基准(UNBench),这是首个旨在评估LLMs在四项相互关联的政治科学任务中表现的综合性基准:共同提案国判定、代表投票模拟、草案采纳预测及代表声明生成。这些任务贯穿联合国决策过程的三个阶段——起草、投票与讨论,旨在评估LLMs理解与模拟政治动态的能力。我们的实验分析揭示了LLMs在这一领域应用的潜力与挑战,为理解其在政治科学中的优势与局限提供了洞见。本工作促进了人工智能与政治科学的交叉融合,为全球治理的研究与实践应用开辟了新途径。UNBench资源库可通过以下链接访问:https://github.com/yueqingliang1/UNBench。
English
Large Language Models (LLMs) have achieved significant advances in natural language processing, yet their potential for high-stake political decision-making remains largely unexplored. This paper addresses the gap by focusing on the application of LLMs to the United Nations (UN) decision-making process, where the stakes are particularly high and political decisions can have far-reaching consequences. We introduce a novel dataset comprising publicly available UN Security Council (UNSC) records from 1994 to 2024, including draft resolutions, voting records, and diplomatic speeches. Using this dataset, we propose the United Nations Benchmark (UNBench), the first comprehensive benchmark designed to evaluate LLMs across four interconnected political science tasks: co-penholder judgment, representative voting simulation, draft adoption prediction, and representative statement generation. These tasks span the three stages of the UN decision-making process--drafting, voting, and discussing--and aim to assess LLMs' ability to understand and simulate political dynamics. Our experimental analysis demonstrates the potential and challenges of applying LLMs in this domain, providing insights into their strengths and limitations in political science. This work contributes to the growing intersection of AI and political science, opening new avenues for research and practical applications in global governance. The UNBench Repository can be accessed at: https://github.com/yueqingliang1/UNBench.

Summary

AI-Generated Summary

PDF22February 24, 2025