小型語言模型中的推理蒸餾與精煉應用於文件重排序
Distillation and Refinement of Reasoning in Small Language Models for Document Re-ranking
April 4, 2025
作者: Chris Samarinas, Hamed Zamani
cs.AI
摘要
我們提出了一種新穎的方法,用於訓練小型語言模型進行推理密集型的文檔排序,該方法結合了知識蒸餾與強化學習優化。現有方法通常依賴於昂貴的人工標註或大型黑箱語言模型,而我們的方法則利用網絡數據和一個教師大型語言模型(LLM)來自動生成高質量的訓練示例,並附帶相關性解釋。通過將文檔排序框架化為強化學習問題,並激勵顯式推理能力,我們訓練了一個僅有30億參數的緊湊型語言模型,該模型在BRIGHT基準測試中達到了最先進的性能。我們的模型在排行榜上位列第三,同時使用的參數數量遠少於其他方法,甚至超越了參數量超過其20倍的模型。通過大量實驗,我們證明了在推理過程中生成解釋,而非直接預測相關性分數,能夠使小型語言模型實現更有效的推理。我們方法的自監督特性為現代信息檢索系統提供了一個可擴展且可解釋的解決方案。
English
We present a novel approach for training small language models for
reasoning-intensive document ranking that combines knowledge distillation with
reinforcement learning optimization. While existing methods often rely on
expensive human annotations or large black-box language models, our methodology
leverages web data and a teacher LLM to automatically generate high-quality
training examples with relevance explanations. By framing document ranking as a
reinforcement learning problem and incentivizing explicit reasoning
capabilities, we train a compact 3B parameter language model that achieves
state-of-the-art performance on the BRIGHT benchmark. Our model ranks third on
the leaderboard while using substantially fewer parameters than other
approaches, outperforming models that are over 20 times larger. Through
extensive experiments, we demonstrate that generating explanations during
inference, rather than directly predicting relevance scores, enables more
effective reasoning with smaller language models. The self-supervised nature of
our method offers a scalable and interpretable solution for modern information
retrieval systems.Summary
AI-Generated Summary