APE:通过自适应并行编码实现更快速和更长上下文增强生成
APE: Faster and Longer Context-Augmented Generation via Adaptive Parallel Encoding
February 8, 2025
作者: Xinyu Yang, Tianqi Chen, Beidi Chen
cs.AI
摘要
上下文增强生成(CAG)技术,包括RAG和ICL,需要有效地组合多个上下文以生成对用户查询的响应。直接将这些上下文作为序列输入会引入相当大的计算负担,因为需要为每个请求重新对组合选择的上下文进行重新编码。为了解决这个问题,我们探讨了并行编码的潜在优势,即独立预先计算和缓存每个上下文的KV状态。这种方法使得在推断过程中可以直接加载缓存状态,同时通过在不同上下文之间重复使用位置来容纳更多的上下文。然而,由于注意力分布的不对齐,直接应用并行编码会导致性能显著下降。为了实现有效和高效的CAG,我们提出了自适应并行编码(APE),它引入了共享前缀、注意力温度和缩放因子,以使并行编码的分布与顺序编码对齐。在RAG和ICL任务上的结果表明,APE可以保持98%和93%的顺序编码性能,同时分别比并行编码高出3.6%和7.9%。它还可以扩展到多样本CAG,有效地并行编码数百个上下文。效率评估显示,APE可以通过减少128K长度上下文的28倍预填充时间,实现端到端4.5倍的加速。
English
Context-augmented generation (CAG) techniques, including RAG and ICL, require
the efficient combination of multiple contexts to generate responses to user
queries. Directly inputting these contexts as a sequence introduces a
considerable computational burden by re-encoding the combined selection of
contexts for every request. To address this, we explore the promising potential
of parallel encoding to independently pre-compute and cache each context's KV
states. This approach enables the direct loading of cached states during
inference while accommodating more contexts through position reuse across
contexts. However, due to misalignments in attention distribution, directly
applying parallel encoding results in a significant performance drop. To enable
effective and efficient CAG, we propose Adaptive Parallel Encoding
(APE), which brings shared prefix, attention temperature, and
scaling factor to align the distribution of parallel encoding with sequential
encoding. Results on RAG and ICL tasks demonstrate that APE can preserve 98%
and 93% sequential encoding performance using the same inputs while
outperforming parallel encoding by 3.6% and 7.9%, respectively. It also scales
to many-shot CAG, effectively encoding hundreds of contexts in parallel.
Efficiency evaluation shows that APE can achieve an end-to-end 4.5times
speedup by reducing 28times prefilling time for a 128K-length context.Summary
AI-Generated Summary