GReaTer:通过推理梯度使较小的语言模型更强大 提示优化器
GReaTer: Gradients over Reasoning Makes Smaller Language Models Strong Prompt Optimizers
December 12, 2024
作者: Sarkar Snigdha Sarathi Das, Ryo Kamoi, Bo Pang, Yusen Zhang, Caiming Xiong, Rui Zhang
cs.AI
摘要
大型语言模型(LLMs)的有效性与提示的设计密切相关,因此优化提示对于增强它们在各种任务中的性能至关重要。许多现有的自动化提示工程方法仅依赖于文本反馈,根据大型、计算昂贵的LLMs识别的推理错误来完善提示。不幸的是,较小的模型难以生成高质量的反馈,导致完全依赖大型LLMs的判断。此外,这些方法未能利用更直接和更精细的信息,如梯度,因为它们纯粹在文本空间中运行。为此,我们引入了GReaTer,一种新颖的提示优化技术,直接将梯度信息与特定任务的推理结合起来。通过利用任务损失梯度,GReaTer使开源、轻量级语言模型自我优化提示成为可能,而无需昂贵的闭源LLMs。这使得高性能提示优化成为可能,而无需依赖庞大的LLMs,缩小了较小模型与通常需要进行提示优化的复杂推理之间的差距。对包括BBH、GSM8k和FOLIO在内的各种推理任务进行了广泛评估,结果显示GReaTer始终优于以往最先进的提示优化方法,甚至优于依赖强大LLMs的方法。此外,经过GReaTer优化的提示经常表现出更好的可转移性,并且在某些情况下,将任务性能提升到与大型语言模型实现的水平相当甚至超过,突显了梯度引导的提示优化在推理过程中的有效性。GReaTer的代码可在https://github.com/psunlpgroup/GreaTer找到。
English
The effectiveness of large language models (LLMs) is closely tied to the
design of prompts, making prompt optimization essential for enhancing their
performance across a wide range of tasks. Many existing approaches to
automating prompt engineering rely exclusively on textual feedback, refining
prompts based solely on inference errors identified by large, computationally
expensive LLMs. Unfortunately, smaller models struggle to generate high-quality
feedback, resulting in complete dependence on large LLM judgment. Moreover,
these methods fail to leverage more direct and finer-grained information, such
as gradients, due to operating purely in text space. To this end, we introduce
GReaTer, a novel prompt optimization technique that directly incorporates
gradient information over task-specific reasoning. By utilizing task loss
gradients, GReaTer enables self-optimization of prompts for open-source,
lightweight language models without the need for costly closed-source LLMs.
This allows high-performance prompt optimization without dependence on massive
LLMs, closing the gap between smaller models and the sophisticated reasoning
often needed for prompt refinement. Extensive evaluations across diverse
reasoning tasks including BBH, GSM8k, and FOLIO demonstrate that GReaTer
consistently outperforms previous state-of-the-art prompt optimization methods,
even those reliant on powerful LLMs. Additionally, GReaTer-optimized prompts
frequently exhibit better transferability and, in some cases, boost task
performance to levels comparable to or surpassing those achieved by larger
language models, highlighting the effectiveness of prompt optimization guided
by gradients over reasoning. Code of GReaTer is available at
https://github.com/psunlpgroup/GreaTer.Summary
AI-Generated Summary