언어 모델은 알고 있는 것을 선호합니다: 상대적 확신 추정을 통한 확신 선호도

초록

언어 모델(Language Models, LMs)은 사용자가 실수를 감지하고 필요할 때는 인간 전문가에게 양도할 수 있도록 신뢰할 수 있는 신뢰도 추정을 제공해야 합니다. 언어 모델에게 자신의 신뢰도를 평가하도록 요청하는 것("0부터 1까지의 신뢰도를 점수로 매겨주세요.")은 그 불확실성을 평가하는 자연스러운 방법입니다. 그러나 모델은 절대적인 신뢰도 평가(즉, 다른 질문들과 독립적으로 질문에 대한 신뢰를 판단하는 것)를 제공하는 데 어려움을 겪으며, 그들이 생성하는 굵은 점수는 답변의 정확성을 평가하는 데 유용하지 않습니다. 저희는 상대적인 신뢰도 추정을 제안합니다. 여기서 우리는 서로 질문을 대결시키고 모델에게 상대적인 신뢰도 판단을 내리도록 요청합니다("어떤 질문에 대해 정확하게 대답하는 데 더 자신이 있습니까?"). 각 질문을 다른 질문들과 대결시키는 "선수"로 취급하고 모델의 선호도를 경기 결과로 삼아 Elo 등급 및 Bradley-Terry와 같은 순위 집계 방법을 사용하여 모델의 신뢰도 선호를 신뢰도 점수로 변환할 수 있습니다. 저희는 상대적인 신뢰도 추정을 절대적인 신뢰도 추정 및 자기 일관성 신뢰 방법과 비교하여 다섯 가지 최첨단 언어 모델인 GPT-4, GPT-4o, Gemini 1.5 Pro, Claude 3.5 Sonnet, 그리고 Llama 3.1 405B에서 14가지 어려운 STEM, 사회과학, 그리고 상식적 추론 질문 응답 작업에 대해 평가합니다. 결과는 상대적인 신뢰도 추정이 절대적인 신뢰도 추정 방법보다 신뢰할 만한 신뢰도 점수를 일관되게 제공하며, 모든 모델과 데이터셋에서 직접적인 절대적인 신뢰도 추정 방법보다 선택적 분류 AUC에서 3.5%의 평균 향상 및 자기 일관성 접근 방식보다 1.7%의 평균 향상을 보여줍니다.

English

Language models (LMs) should provide reliable confidence estimates to help users detect mistakes in their outputs and defer to human experts when necessary. Asking a language model to assess its confidence ("Score your confidence from 0-1.") is a natural way of evaluating its uncertainty. However, models struggle to provide absolute assessments of confidence (i.e. judging confidence in answering a question independent of other questions) and the coarse-grained scores they produce are not useful for evaluating the correctness of their answers. We propose relative confidence estimation, where we match up questions against each other and ask the model to make relative judgments of confidence ("Which question are you more confident in answering correctly?"). Treating each question as a "player" in a series of matchups against other questions and the model's preferences as match outcomes, we can use rank aggregation methods like Elo rating and Bradley-Terry to translate the model's confidence preferences into confidence scores. We evaluate relative confidence estimation against absolute confidence estimation and self-consistency confidence methods on five state-of-the-art LMs -- GPT-4, GPT-4o, Gemini 1.5 Pro, Claude 3.5 Sonnet, and Llama 3.1 405B -- across 14 challenging STEM, social science, and commonsense reasoning question answering tasks. Our results demonstrate that relative confidence estimation consistently provides more reliable confidence scores than absolute confidence estimation, with average gains of 3.5% in selective classification AUC over direct absolute confidence estimation methods and 1.7% over self-consistency approaches across all models and datasets.

언어 모델은 알고 있는 것을 선호합니다: 상대적 확신 추정을 통한 확신 선호도

Language Models Prefer What They Know: Relative Confidence Estimation via Confidence Preferences

초록

Support