文件泛滥：缩放重新排序推断的后果

摘要

重新排序器通常是交叉编码器，经常用于对廉价的初始IR系统检索到的文档进行重新评分。这是因为，尽管昂贵，重新排序器被认为更有效。我们挑战这一假设，通过衡量重新排序器在完整检索中的性能，而不仅仅是重新评分第一阶段检索。我们的实验揭示了一个令人惊讶的趋势：当逐渐为更多文档评分时，最好的现有重新排序器提供递减回报，并实际上在一定限度后降低了质量。事实上，在这种情况下，重新排序器经常会为与查询没有词汇或语义重叠的文档分配高分。我们希望我们的发现能激发未来改进重新排序的研究。

English

Rerankers, typically cross-encoders, are often used to re-score the documents retrieved by cheaper initial IR systems. This is because, though expensive, rerankers are assumed to be more effective. We challenge this assumption by measuring reranker performance for full retrieval, not just re-scoring first-stage retrieval. Our experiments reveal a surprising trend: the best existing rerankers provide diminishing returns when scoring progressively more documents and actually degrade quality beyond a certain limit. In fact, in this setting, rerankers can frequently assign high scores to documents with no lexical or semantic overlap with the query. We hope that our findings will spur future research to improve reranking.

文件泛滥：缩放重新排序推断的后果

Drowning in Documents: Consequences of Scaling Reranker Inference

摘要

Summary

Support