RARe:检索增强检索与上下文示例
RARe: Retrieval Augmented Retrieval with In-Context Examples
October 26, 2024
作者: Atula Tejaswi, Yoonsang Lee, Sujay Sanghavi, Eunsol Choi
cs.AI
摘要
我们研究了在检索任务中,是否可以通过上下文示例来提高嵌入模型的性能,这在仅解码器语言模型(LLMs)中被广泛使用。与LLMs不同的是,在推断时简单地将上下文示例(查询-文档对)直接添加到目标查询之前并不能立即奏效。我们提出了一种简单的方法来使检索器能够使用上下文示例。我们的方法RARe,对一个预训练模型进行微调,使用语义上与目标查询相似的上下文示例。这种方法可以应用于不同的基础架构(即仅解码器语言模型、检索器模型),并在各种开放域检索数据集(如BeIR、RAR-b)上稳定地实现高达+2.72%的nDCG性能增益。特别地,我们发现RARe在跨领域泛化方面表现更强,相比于不使用上下文示例的模型,类似于LLMs中的上下文学习。我们进一步对上下文示例增强的设计选择进行了分析,并为未来在这一领域的工作奠定了基础。
English
We investigate whether in-context examples, widely used in decoder-only
language models (LLMs), can improve embedding model performance in retrieval
tasks. Unlike in LLMs, naively prepending in-context examples (query-document
pairs) to the target query at inference time does not work out of the box. We
introduce a simple approach to enable retrievers to use in-context examples.
Our approach, RARe, finetunes a pre-trained model with in-context examples
whose query is semantically similar to the target query. This can be applied to
adapt various base architectures (i.e., decoder-only language models, retriever
models) and consistently achieves performance gains of up to +2.72% nDCG across
various open-domain retrieval datasets (BeIR, RAR-b). In particular, we find
RARe exhibits stronger out-of-domain generalization compared to models using
queries without in-context examples, similar to what is seen for in-context
learning in LLMs. We further provide analysis on the design choices of
in-context example augmentation and lay the foundation for future work in this
space.Summary
AI-Generated Summary