환각 탈출: 대규모 언어 모델 훈련을 위한 민감한 신경세포 드롭아웃 (SeND)

초록

대형 언어 모델(LLMs)이 다양한 산업 분야에 점점 더 많이 도입되면서, 특히 사용자 입력과 사실적으로 관련이 없거나 관련이 없는 결과물을 생성하는 환각에 대한 신뢰성에 대한 우려가 커졌습니다. 저희 연구는 기존 연구가 주로 사후 감지 및 완화 전략에 초점을 맞추고 있는 기존 연구의 주요 공백을 해소하기 위해 훈련 과정과 환각의 발생 사이의 관계를 조사합니다. Pythia 스위트(70M-12B 매개변수)의 모델과 여러 환각 감지 메트릭을 사용하여 훈련 중 환각 트렌드를 분석하고 LLM 내부 역학을 탐구합니다. 환각을 완화하기 위해 분산을 감소시키는 새로운 훈련 프로토콜인 SEnsitive Neuron Dropout (SeND)를 소개합니다. SeND는 데이터셋에서 중요한 변동성을 가진 뉴런, 즉 민감한 뉴런이라고 불리는 뉴런을 결정적으로 제거함으로써 이를 달성합니다. 또한, 전통적인 EigenScore를 2배 빠른 속도로 근사하는 효율적인 EigenScore (EES)를 개발합니다. 이 효율적인 메트릭은 우리의 프로토콜에 통합되어 SeND가 계산적으로 확장 가능하고 환각을 줄이는 데 효과적인 방법이 되도록 합니다. 우리의 경험적 평가는 우리의 접근 방식이 일반적인 훈련과 비교하여 시험 시 LLM 신뢰성을 최대 40% 향상시키면서, Wikipedia 및 의료 데이터셋과 같은 도메인에 LLM을 적응시킬 때 사실적 정확도를 향상시키는 효율적인 방법을 제공함을 보여줍니다.

English

As large language models (LLMs) become increasingly deployed across various industries, concerns regarding their reliability, particularly due to hallucinations-outputs that are factually inaccurate or irrelevant to user input-have grown. Our research investigates the relationship between the training process and the emergence of hallucinations to address a key gap in existing research that focuses primarily on post hoc detection and mitigation strategies. Using models from the Pythia suite (70M-12B parameters) and several hallucination detection metrics, we analyze hallucination trends throughout training and explore LLM internal dynamics. We introduce SEnsitive Neuron Dropout (SeND), a novel training protocol designed to mitigate hallucinations by reducing variance during training. SeND achieves this by deterministically dropping neurons with significant variability on a dataset, referred to as Sensitive Neurons. In addition, we develop an unsupervised hallucination detection metric, Efficient EigenScore (EES), which approximates the traditional EigenScore in 2x speed. This efficient metric is integrated into our protocol, allowing SeND to be both computationally scalable and effective at reducing hallucinations. Our empirical evaluation demonstrates that our approach improves LLM reliability at test time by up to 40% compared to normal training while also providing an efficient method to improve factual accuracy when adapting LLMs to domains such as Wikipedia and Medical datasets.

환각 탈출: 대규모 언어 모델 훈련을 위한 민감한 신경세포 드롭아웃 (SeND)

Hallucination Detox: Sensitive Neuron Dropout (SeND) for Large Language Model Training

초록

Summary

Support