幻覺排毒：用於大型語言模型訓練的敏感神經元放棄（SeND）

摘要

隨著大型語言模型（LLMs）在各行業中的部署日益增加，對其可靠性的擔憂也日益增加，特別是由於幻覺-輸出結果在事實上不準確或與用戶輸入無關。我們的研究調查了訓練過程與幻覺出現之間的關係，以解決現有研究中的一個關鍵缺口，該研究主要集中在事後檢測和緩解策略。我們使用Pythia套件中的模型（70M-12B參數）和多個幻覺檢測指標，分析訓練過程中的幻覺趨勢，並探索LLM內部動態。我們引入了一種名為SEnsitive Neuron Dropout（SeND）的新型訓練協議，旨在通過在訓練過程中減少變異來緩解幻覺。SeND通過在數據集上具有顯著變異性的神經元，即敏感神經元，來確定性地丟棄神經元來實現這一目標。此外，我們開發了一種無監督幻覺檢測指標，即Efficient EigenScore（EES），它以2倍速度近似傳統的EigenScore。這種高效的指標被整合到我們的協議中，使SeND在計算上既可擴展又能有效減少幻覺。我們的實證評估表明，與正常訓練相比，我們的方法在測試時將LLM的可靠性提高了高達40％，同時還提供了一種有效的方法，可以在將LLM適應於維基百科和醫學數據集等領域時提高事實準確性。

English

As large language models (LLMs) become increasingly deployed across various industries, concerns regarding their reliability, particularly due to hallucinations-outputs that are factually inaccurate or irrelevant to user input-have grown. Our research investigates the relationship between the training process and the emergence of hallucinations to address a key gap in existing research that focuses primarily on post hoc detection and mitigation strategies. Using models from the Pythia suite (70M-12B parameters) and several hallucination detection metrics, we analyze hallucination trends throughout training and explore LLM internal dynamics. We introduce SEnsitive Neuron Dropout (SeND), a novel training protocol designed to mitigate hallucinations by reducing variance during training. SeND achieves this by deterministically dropping neurons with significant variability on a dataset, referred to as Sensitive Neurons. In addition, we develop an unsupervised hallucination detection metric, Efficient EigenScore (EES), which approximates the traditional EigenScore in 2x speed. This efficient metric is integrated into our protocol, allowing SeND to be both computationally scalable and effective at reducing hallucinations. Our empirical evaluation demonstrates that our approach improves LLM reliability at test time by up to 40% compared to normal training while also providing an efficient method to improve factual accuracy when adapting LLMs to domains such as Wikipedia and Medical datasets.

幻覺排毒：用於大型語言模型訓練的敏感神經元放棄（SeND）

Hallucination Detox: Sensitive Neuron Dropout (SeND) for Large Language Model Training

摘要

Summary

Support

Support