HoT:高亮思维链,用于从输入中引用支持性事实
HoT: Highlighted Chain of Thought for Referencing Supporting Facts from Inputs
March 3, 2025
作者: Tin Nguyen, Logan Bolton, Mohammad Reza Taesiri, Anh Totti Nguyen
cs.AI
摘要
大型语言模型(LLMs)的一个显著弱点是其倾向于生成非事实性陈述。当回应中混杂着事实与非事实内容时,人类在验证并据此做出准确决策时面临挑战。为应对此问题,我们提出了高亮思维链提示法(Highlighted Chain-of-Thought Prompting, HoT),这一技术旨在引导LLMs生成带有XML标签的回应,将事实与查询中提供的信息相锚定。具体而言,给定一个输入问题,LLMs首先会重新格式化问题,添加XML标签以突出关键事实,随后生成回应,并在引用输入事实的部分进行高亮显示。值得注意的是,在少量示例的设定下,HoT在从算术、阅读理解到逻辑推理等17项广泛任务上均优于传统的思维链提示法(CoT)。当要求人类验证LLM的回应时,高亮显示帮助时间有限的参与者更准确、高效地识别出LLM何时正确。然而,令人意外的是,当LLM出错时,HoT往往会让用户误以为答案是正确的。
English
An Achilles heel of Large Language Models (LLMs) is their tendency to
hallucinate non-factual statements. A response mixed of factual and non-factual
statements poses a challenge for humans to verify and accurately base their
decisions on. To combat this problem, we propose Highlighted Chain-of-Thought
Prompting (HoT), a technique for prompting LLMs to generate responses with XML
tags that ground facts to those provided in the query. That is, given an input
question, LLMs would first re-format the question to add XML tags highlighting
key facts, and then, generate a response with highlights over the facts
referenced from the input. Interestingly, in few-shot settings, HoT outperforms
vanilla chain of thought prompting (CoT) on a wide range of 17 tasks from
arithmetic, reading comprehension to logical reasoning. When asking humans to
verify LLM responses, highlights help time-limited participants to more
accurately and efficiently recognize when LLMs are correct. Yet, surprisingly,
when LLMs are wrong, HoTs tend to make users believe that an answer is correct.Summary
AI-Generated Summary