会议代表：对代表我们出席会议的语言模型进行基准测试

摘要

在当代工作场所，会议对于交换想法和确保团队一致性至关重要，但往往面临诸如耗时、日程冲突和参与效率低下等挑战。大型语言模型（LLMs）的最新进展展示了它们在自然语言生成和推理方面的强大能力，引发了一个问题：LLMs能否有效地委派会议参与者？为了探讨这一问题，我们开发了一个原型LLM驱动的会议代表系统，并利用真实会议记录创建了一个全面的基准。我们的评估表明，GPT-4/4o在积极和谨慎的参与策略之间保持了平衡的表现。相比之下，Gemini 1.5 Pro倾向于更加谨慎，而Gemini 1.5 Flash和Llama3-8B/70B显示出更积极的倾向。总体而言，约60\%的回复至少涉及地面真实情况中的一个关键点。然而，需要改进以减少无关或重复内容，并增强对真实世界环境中常见的转录错误的容忍度。此外，我们将该系统应用于实际环境，并收集了来自演示的真实反馈。我们的研究结果突显了利用LLMs作为会议代表的潜力和挑战，为减轻会议负担提供了宝贵的见解。

English

In contemporary workplaces, meetings are essential for exchanging ideas and ensuring team alignment but often face challenges such as time consumption, scheduling conflicts, and inefficient participation. Recent advancements in Large Language Models (LLMs) have demonstrated their strong capabilities in natural language generation and reasoning, prompting the question: can LLMs effectively delegate participants in meetings? To explore this, we develop a prototype LLM-powered meeting delegate system and create a comprehensive benchmark using real meeting transcripts. Our evaluation reveals that GPT-4/4o maintain balanced performance between active and cautious engagement strategies. In contrast, Gemini 1.5 Pro tends to be more cautious, while Gemini 1.5 Flash and Llama3-8B/70B display more active tendencies. Overall, about 60\% of responses address at least one key point from the ground-truth. However, improvements are needed to reduce irrelevant or repetitive content and enhance tolerance for transcription errors commonly found in real-world settings. Additionally, we implement the system in practical settings and collect real-world feedback from demos. Our findings underscore the potential and challenges of utilizing LLMs as meeting delegates, offering valuable insights into their practical application for alleviating the burden of meetings.

会议代表：对代表我们出席会议的语言模型进行基准测试

MEETING DELEGATE: Benchmarking LLMs on Attending Meetings on Our Behalf

摘要

Summary

Support