CodeSteer：通过代码/文本引导进行符号增强的语言模型

摘要

现有方法未能有效引导大型语言模型（LLMs）在文本推理和代码生成之间进行平衡，导致符号计算能力被低效利用。我们引入了CodeSteer，这是一种有效的方法，用于指导LLM的代码/文本生成。我们构建了一个全面的基准SymBench，其中包含37个具有可调整复杂性的符号任务，并合成了包含12,000个多轮引导/生成轨迹和5,500个引导比较对的数据集。我们使用新设计的多轮监督微调（SFT）和直接偏好优化（DPO）对Llama-3-8B模型进行微调。得到的模型CodeSteerLLM，增加了提出的符号和自答检查器，有效地引导更大型模型的代码/文本生成。通过将CodeSteer与GPT-4o相结合，其平均性能得分从53.3提高到86.4，甚至在所有37个任务（28个已见，9个未见）上都超过了现有最佳LLM OpenAI o1（82.7）、o1-preview（74.8）和DeepSeek R1（76.8）。针对GPT-4o进行训练，CodeSteer展现出卓越的泛化能力，在Claude、Mistral和GPT-3.5上提供了平均41.8的性能提升。CodeSteer引导的LLMs充分利用符号计算，在高度复杂的任务上保持强大性能。模型、数据集和代码可在以下网址获取：https://github.com/yongchao98/CodeSteer-v1.0。

English

Existing methods fail to effectively steer Large Language Models (LLMs) between textual reasoning and code generation, leaving symbolic computing capabilities underutilized. We introduce CodeSteer, an effective method for guiding LLM code/text generation. We construct a comprehensive benchmark SymBench comprising 37 symbolic tasks with adjustable complexity and also synthesize datasets of 12k multi-round guidance/generation trajectories and 5.5k guidance comparison pairs. We fine-tune the Llama-3-8B model with a newly designed multi-round supervised fine-tuning (SFT) and direct preference optimization (DPO). The resulting model, CodeSteerLLM, augmented with the proposed symbolic and self-answer checkers, effectively guides the code/text generation of larger models. Augmenting GPT-4o with CodeSteer raises its average performance score from 53.3 to 86.4, even outperforming the existing best LLM OpenAI o1 (82.7), o1-preview (74.8), and DeepSeek R1 (76.8) across all 37 tasks (28 seen, 9 unseen). Trained for GPT-4o, CodeSteer demonstrates superior generalizability, providing an average 41.8 performance boost on Claude, Mistral, and GPT-3.5. CodeSteer-guided LLMs fully harness symbolic computing to maintain strong performance on highly complex tasks. Models, Datasets, and Codes are available at https://github.com/yongchao98/CodeSteer-v1.0.

CodeSteer：通过代码/文本引导进行符号增强的语言模型

CodeSteer: Symbolic-Augmented Language Models via Code/Text Guidance

摘要

Summary

Support