我不知道:使用一個[IDK]標記來明確建模不確定性
I Don't Know: Explicit Modeling of Uncertainty with an [IDK] Token
December 9, 2024
作者: Roi Cohen, Konstantin Dobler, Eden Biran, Gerard de Melo
cs.AI
摘要
大型語言模型被認為能夠捕捉現實世界知識,使其在許多下游任務中表現出色。儘管近年來取得了進展,這些模型仍然容易出現所謂的幻覺,導致它們生成不需要且事實不正確的文本。在這項工作中,我們提出了一種新的校準方法,可用於對抗幻覺。我們在模型的詞彙表中添加了一個特殊的「我不知道」(IDK)標記,並引入了一個將概率質量轉移到「我不知道」標記以修正不正確預測的客觀函數。這種方法使模型能夠明確地表達其輸出中的不確定性。我們在多個模型架構和事實性下游任務中評估了我們提出的方法。我們發現,使用我們的方法訓練的模型能夠在以前容易出錯的地方表達不確定性,同時僅損失少量編碼知識。我們進一步對我們方法的多種變體進行了廣泛的消融研究,並對我們方法的精確度-召回率折衷進行了詳細分析。
English
Large Language Models are known to capture real-world knowledge, allowing
them to excel in many downstream tasks. Despite recent advances, these models
are still prone to what are commonly known as hallucinations, causing them to
emit unwanted and factually incorrect text. In this work, we propose a novel
calibration method that can be used to combat hallucinations. We add a special
[IDK] ("I don't know") token to the model's vocabulary and introduce an
objective function that shifts probability mass to the [IDK] token for
incorrect predictions. This approach allows the model to express uncertainty in
its output explicitly. We evaluate our proposed method across multiple model
architectures and factual downstream tasks. We find that models trained with
our method are able to express uncertainty in places where they would
previously make mistakes while suffering only a small loss of encoded
knowledge. We further perform extensive ablation studies of multiple variations
of our approach and provide a detailed analysis of the precision-recall
tradeoff of our method.Summary
AI-Generated Summary