토큰 풀링을 통해 최소 성능 영향으로 다중 벡터 검색의 풋프린트 축소하기

초록

지난 몇 년 동안 ColBERT를 주도로 한 다중 벡터 검색 방법이 신경 기반 정보 검색에 점점 더 인기 있는 접근 방식이 되었습니다. 이러한 방법은 문서 수준이 아닌 토큰 수준에서 표현을 저장함으로써, 특히 도메인 밖 환경에서 매우 강력한 검색 성능을 보여주었습니다. 그러나 연관된 벡터의 대량 저장에 필요한 저장 및 메모리 요구 사항은 여전히 중요한 단점으로 남아 있어 실용적인 채택을 방해하고 있습니다. 본 논문에서는 저장해야 하는 벡터의 수를 대폭 줄이기 위한 간단한 클러스터링 기반 토큰 풀링 접근 방식을 소개합니다. 이 방법은 ColBERT 인덱스의 공간 및 메모리 풋프린트를 거의 훼손 없이 50% 줄일 수 있습니다. 이 방법은 또한 벡터 수를 66%에서 75%까지 추가로 줄일 수 있으며, 대부분의 데이터셋에서 5% 미만의 성능 저하로 유지됩니다. 이 접근 방식은 아키텍처 변경이나 쿼리 시간 처리가 필요하지 않으며, ColBERT와 유사한 모델을 사용하여 인덱싱하는 동안 간단히 적용할 수 있습니다.

English

Over the last few years, multi-vector retrieval methods, spearheaded by ColBERT, have become an increasingly popular approach to Neural IR. By storing representations at the token level rather than at the document level, these methods have demonstrated very strong retrieval performance, especially in out-of-domain settings. However, the storage and memory requirements necessary to store the large number of associated vectors remain an important drawback, hindering practical adoption. In this paper, we introduce a simple clustering-based token pooling approach to aggressively reduce the number of vectors that need to be stored. This method can reduce the space & memory footprint of ColBERT indexes by 50% with virtually no retrieval performance degradation. This method also allows for further reductions, reducing the vector count by 66%-to-75% , with degradation remaining below 5% on a vast majority of datasets. Importantly, this approach requires no architectural change nor query-time processing, and can be used as a simple drop-in during indexation with any ColBERT-like model.

토큰 풀링을 통해 최소 성능 영향으로 다중 벡터 검색의 풋프린트 축소하기

Reducing the Footprint of Multi-Vector Retrieval with Minimal Performance Impact via Token Pooling

초록

Summary

Support

Support