ChatPaper.aiChatPaper

SIFT:通过贴纸将大语言模型推理锚定于上下文

SIFT: Grounding LLM Reasoning in Contexts via Stickers

February 19, 2025
作者: Zihao Zeng, Xuyao Huang, Boxiu Li, Zhijie Deng
cs.AI

摘要

本文指出,在大型语言模型的推理过程中,对上下文信息的误读可能成为一个显著问题,这一问题从较小模型如Llama3.2-3B-Instruct到前沿模型如DeepSeek-R1均存在。例如,在短语“每公斤10美元”中,LLMs可能无法识别“每”意味着“每一单位”,从而导致计算错误。为此,我们引入了一种新颖的后训练方法——**坚守事实(SIFT)**,以应对这一挑战。SIFT利用增加的推理时计算资源,将LLM的推理过程锚定于上下文之中。SIFT的核心在于*贴纸*,它由模型自身生成,旨在明确强调上下文中的关键信息。基于精心设计的贴纸,SIFT生成两个预测结果——一个来自原始查询,另一个来自结合了贴纸的增强查询。若两者存在差异,贴纸将通过*正向*优化(以更好地使提取的事实与查询对齐)和*逆向*生成(以符合模型的内在倾向)进行序列化精炼,从而获得更为忠实的推理结果。跨多种模型(从3B到100B+)和基准测试(如GSM8K、MATH-500)的研究均显示出性能的持续提升。尤为突出的是,SIFT将DeepSeek-R1在AIME2024上的pass@1准确率从78.33%提升至**85.67%**,在开源社区中树立了新的技术标杆。代码已发布于https://github.com/zhijie-group/SIFT。
English
This paper identifies the misinterpretation of the context can be a significant issue during the reasoning process of large language models, spanning from smaller models like Llama3.2-3B-Instruct to cutting-edge ones like DeepSeek-R1. For example, in the phrase "10 dollars per kilo," LLMs might not recognize that "per" means "for each," leading to calculation errors. We introduce a novel, post-training approach called **Stick to the Facts (SIFT)** to tackle this. SIFT leverages increasing inference-time compute to ground LLM reasoning in contexts. At the core of SIFT lies the *Sticker*, which is generated by the model itself to explicitly emphasize the key information within the context. Given the curated Sticker, SIFT generates two predictions -- one from the original query and one from the query augmented with the Sticker. If they differ, the Sticker is sequentially refined via *forward* optimization (to better align the extracted facts with the query) and *inverse* generation (to conform with the model's inherent tendencies) for more faithful reasoning outcomes. Studies across diverse models (from 3B to 100B+) and benchmarks (e.g., GSM8K, MATH-500) reveal consistent performance improvements. Notably, SIFT improves the pass@1 accuracy of DeepSeek-R1 on AIME2024 from 78.33% to **85.67**%, establishing a new state-of-the-art in the open-source community. The code is available at https://github.com/zhijie-group/SIFT.

Summary

AI-Generated Summary

PDF303February 24, 2025