SAGE:面向RAG的精准检索框架
SAGE: A Framework of Precise Retrieval for RAG
March 3, 2025
作者: Jintao Zhang, Guoliang Li, Jinyang Su
cs.AI
摘要
检索增强生成(RAG)在特定语料库内执行问答(QA)任务时展现出了显著的能力。然而,RAG在QA中的失败案例依然众多。这些失败并非仅归因于大型语言模型(LLMs)的局限,而主要是由于检索为LLMs提供的信息不准确,这源于两大限制:(1)当前RAG方法在分割语料库时未考虑语义,导致因问题与片段间关联受损而难以找到相关上下文。(2)在检索较少上下文时遗漏关键信息与检索更多上下文时引入无关信息之间存在权衡。
本文提出了一种名为SAGE的RAG框架,以克服上述限制。首先,针对未考虑语义的分割问题,我们提出训练一个语义分割模型,该模型旨在将语料库分割成语义完整的片段。其次,为确保仅检索最相关的片段而忽略无关内容,我们设计了一种基于相关性分数下降速度动态选择片段的算法,从而实现更精准的选择。第三,为进一步确保检索片段的精确性,我们建议让LLMs评估检索到的片段是否过多或不足,并据此调整上下文的数量。实验表明,SAGE在QA质量上平均优于基线方法61.25%。此外,通过避免检索噪声上下文,SAGE降低了LLM推理中消耗的token成本,平均提升了49.41%的成本效率。我们的工作还为提升RAG提供了宝贵的洞见。
English
Retrieval-augmented generation (RAG) has demonstrated significant proficiency
in conducting question-answering (QA) tasks within a specified corpus.
Nonetheless, numerous failure instances of RAG in QA still exist. These
failures are not solely attributable to the limitations of Large Language
Models (LLMs); instead, they predominantly arise from the retrieval of
inaccurate information for LLMs due to two limitations: (1) Current RAG methods
segment the corpus without considering semantics, making it difficult to find
relevant context due to impaired correlation between questions and the
segments. (2) There is a trade-off between missing essential context with fewer
context retrieved and getting irrelevant context with more context retrieved.
In this paper, we introduce a RAG framework (SAGE), to overcome these
limitations. First, to address the segmentation issue without considering
semantics, we propose to train a semantic segmentation model. This model is
trained to segment the corpus into semantically complete chunks. Second, to
ensure that only the most relevant chunks are retrieved while the irrelevant
ones are ignored, we design a chunk selection algorithm to dynamically select
chunks based on the decreasing speed of the relevance score, leading to a more
relevant selection. Third, to further ensure the precision of the retrieved
chunks, we propose letting LLMs assess whether retrieved chunks are excessive
or lacking and then adjust the amount of context accordingly. Experiments show
that SAGE outperforms baselines by 61.25% in the quality of QA on average.
Moreover, by avoiding retrieving noisy context, SAGE lowers the cost of the
tokens consumed in LLM inference and achieves a 49.41% enhancement in cost
efficiency on average. Additionally, our work offers valuable insights for
boosting RAG.Summary
AI-Generated Summary