我不知道:使用一个[IDK]标记对不确定性进行明确建模

I Don't Know: Explicit Modeling of Uncertainty with an [IDK] Token

December 9, 2024
作者: Roi Cohen, Konstantin Dobler, Eden Biran, Gerard de Melo
cs.AI

摘要

大型语言模型以捕捉现实世界知识而闻名,使其在许多下游任务中表现出色。尽管近年来取得了进展,这些模型仍然容易出现常见的幻觉,导致它们生成不需要且事实不准确的文本。在这项工作中,我们提出了一种新颖的校准方法,可用于对抗幻觉。我们向模型的词汇表中添加了一个特殊的[IDK](“我不知道”)标记,并引入了一个将概率质量转移到[IDK]标记以修正错误预测的客观函数。这种方法使模型能够明确地表达其输出中的不确定性。我们评估了我们提出的方法在多个模型架构和事实性下游任务中的表现。我们发现,使用我们的方法训练的模型能够在以前容易出错的地方表达不确定性,同时只会略微损失编码知识。我们进一步对我们方法的多种变体进行了广泛的消融研究,并对我们方法的精确率-召回率权衡进行了详细分析。
English
Large Language Models are known to capture real-world knowledge, allowing them to excel in many downstream tasks. Despite recent advances, these models are still prone to what are commonly known as hallucinations, causing them to emit unwanted and factually incorrect text. In this work, we propose a novel calibration method that can be used to combat hallucinations. We add a special [IDK] ("I don't know") token to the model's vocabulary and introduce an objective function that shifts probability mass to the [IDK] token for incorrect predictions. This approach allows the model to express uncertainty in its output explicitly. We evaluate our proposed method across multiple model architectures and factual downstream tasks. We find that models trained with our method are able to express uncertainty in places where they would previously make mistakes while suffering only a small loss of encoded knowledge. We further perform extensive ablation studies of multiple variations of our approach and provide a detailed analysis of the precision-recall tradeoff of our method.

Summary

AI-Generated Summary

PDF92December 12, 2024