HARP:在Transformer推理过程中考虑犹豫的重构
HARP: Hesitation-Aware Reframing in Transformer Inference Pass
December 10, 2024
作者: Romain Storaï, Seung-won Hwang
cs.AI
摘要
本文旨在通过解决推理步骤中存在的可变计算需求问题,从而提高大型语言模型的性能,其中一些标记需要比其他标记更多的计算资源。我们提出了HARP,这是对“现成”Transformer前向传递的简单修改。借鉴于决策中的犹豫和框架效应,HARP在模型在标记生成过程中遇到不确定性时有选择地应用额外的计算。我们的方法通过在困难的决策点暂停并为不同角度重新构建输入,模仿人类认知过程。与其他方法不同,HARP是与模型无关的、无需训练的,并且易于实现。我们在各种下游任务和模型规模上对我们的方法进行了彻底评估,表明性能提高高达+5.16%。值得注意的是,HARP在保持推理时间比束搜索快两倍的同时实现了这些增益。简单而又具有显著收益,HARP为通过最小计算影响增强基于Transformer的语言模型的性能提供了实用解决方案。
English
This paper aims to improve the performance of large language models by
addressing the variable computational demands in inference steps, where some
tokens require more computational resources than others. We present HARP, a
simple modification to "off-the-shelf" Transformer forward pass. Drawing from
hesitation and the framing effect in decision-making, HARP selectively applies
additional computation when the model encounters uncertainty during token
generation. Our method mimics human cognitive processes by pausing at difficult
decision points and reframing inputs for a different perspective. Unlike other
approaches, HARP is model-agnostic, training-free, and easy to implement. We
thoroughly evaluate our method across various downstream tasks and model sizes,
demonstrating performance improvements up to +5.16%. Notably, HARP achieves
these gains while maintaining inference times twice faster than beam search.
Simple and yet with significant gains, HARP offers a practical solution for
enhancing the performance of Transformer-based language models with minimal
computational impact.Summary
AI-Generated Summary