GReaTer:推理梯度使較小的語言模型更強大 提示優化器
GReaTer: Gradients over Reasoning Makes Smaller Language Models Strong Prompt Optimizers
December 12, 2024
作者: Sarkar Snigdha Sarathi Das, Ryo Kamoi, Bo Pang, Yusen Zhang, Caiming Xiong, Rui Zhang
cs.AI
摘要
大型語言模型(LLMs)的效能與提示設計密切相關,因此優化提示對於提升它們在各種任務中的表現至關重要。許多現有的自動提示工程方法僅依賴於文本反饋,根據大型、計算昂貴的LLMs識別的推理錯誤來優化提示。不幸的是,較小的模型難以生成高質量的反饋,導致完全依賴於大型LLMs的判斷。此外,這些方法無法利用更直接、更精細的信息,如梯度,因為它們僅在文本空間中運作。為此,我們引入了GReaTer,一種新穎的提示優化技術,直接將梯度信息納入特定任務的推理中。通過利用任務損失梯度,GReaTer實現了自我優化提示,適用於開源、輕量級語言模型,無需昂貴的封閉源LLMs。這使得高效的提示優化成為可能,而不依賴於龐大的LLMs,縮小了較小模型與通常需要的複雜推理之間的差距。在包括BBH、GSM8k和FOLIO在內的多樣推理任務上進行了廣泛評估,結果表明GReaTer始終優於先前最先進的提示優化方法,即使是依賴強大LLMs的方法也是如此。此外,經GReaTer優化的提示經常表現出更好的可轉移性,在某些情況下,將任務表現提升至與或超越較大語言模型實現的水平,突顯了梯度引導的提示優化的有效性。GReaTer的代碼可在https://github.com/psunlpgroup/GreaTer找到。
English
The effectiveness of large language models (LLMs) is closely tied to the
design of prompts, making prompt optimization essential for enhancing their
performance across a wide range of tasks. Many existing approaches to
automating prompt engineering rely exclusively on textual feedback, refining
prompts based solely on inference errors identified by large, computationally
expensive LLMs. Unfortunately, smaller models struggle to generate high-quality
feedback, resulting in complete dependence on large LLM judgment. Moreover,
these methods fail to leverage more direct and finer-grained information, such
as gradients, due to operating purely in text space. To this end, we introduce
GReaTer, a novel prompt optimization technique that directly incorporates
gradient information over task-specific reasoning. By utilizing task loss
gradients, GReaTer enables self-optimization of prompts for open-source,
lightweight language models without the need for costly closed-source LLMs.
This allows high-performance prompt optimization without dependence on massive
LLMs, closing the gap between smaller models and the sophisticated reasoning
often needed for prompt refinement. Extensive evaluations across diverse
reasoning tasks including BBH, GSM8k, and FOLIO demonstrate that GReaTer
consistently outperforms previous state-of-the-art prompt optimization methods,
even those reliant on powerful LLMs. Additionally, GReaTer-optimized prompts
frequently exhibit better transferability and, in some cases, boost task
performance to levels comparable to or surpassing those achieved by larger
language models, highlighting the effectiveness of prompt optimization guided
by gradients over reasoning. Code of GReaTer is available at
https://github.com/psunlpgroup/GreaTer.Summary
AI-Generated Summary