ChatPaper.aiChatPaper

IHEval:评估语言模型对指令层次结构的遵循能力

IHEval: Evaluating Language Models on Following the Instruction Hierarchy

February 12, 2025
作者: Zhihan Zhang, Shiyang Li, Zixuan Zhang, Xin Liu, Haoming Jiang, Xianfeng Tang, Yifan Gao, Zheng Li, Haodong Wang, Zhaoxuan Tan, Yichuan Li, Qingyu Yin, Bing Yin, Meng Jiang
cs.AI

摘要

指令层级体系,从系统消息到用户消息、对话历史及工具输出,确立了一个优先级顺序,这对于确保语言模型(LMs)行为的一致性与安全性至关重要。尽管其重要性不言而喻,这一主题却鲜少受到关注,且缺乏全面评估模型遵循指令层级能力的基准测试。为此,我们引入了IHEval这一新颖基准,它包含3,538个示例,覆盖九项任务,涉及不同优先级指令间一致与冲突的情形。对主流LMs的评估揭示出它们在识别指令优先级方面存在显著困难。所有受测模型在面对冲突指令时,相较于其原始指令遵循表现,均遭遇了性能的急剧下滑。此外,表现最为优异的开源模型在解决此类冲突时,准确率仅达48%。我们的研究结果凸显了未来LM发展中针对指令层级进行针对性优化的迫切需求。
English
The instruction hierarchy, which establishes a priority order from system messages to user messages, conversation history, and tool outputs, is essential for ensuring consistent and safe behavior in language models (LMs). Despite its importance, this topic receives limited attention, and there is a lack of comprehensive benchmarks for evaluating models' ability to follow the instruction hierarchy. We bridge this gap by introducing IHEval, a novel benchmark comprising 3,538 examples across nine tasks, covering cases where instructions in different priorities either align or conflict. Our evaluation of popular LMs highlights their struggle to recognize instruction priorities. All evaluated models experience a sharp performance decline when facing conflicting instructions, compared to their original instruction-following performance. Moreover, the most competitive open-source model only achieves 48% accuracy in resolving such conflicts. Our results underscore the need for targeted optimization in the future development of LMs.

Summary

AI-Generated Summary

PDF182February 18, 2025