AlayaDB:高效長上下文LLM推理的數據基礎
AlayaDB: The Data Foundation for Efficient and Effective Long-context LLM Inference
April 14, 2025
作者: Yangshen Deng, Zhengxin You, Long Xiang, Qilong Li, Peiqi Yuan, Zhaoyang Hong, Yitao Zheng, Wanting Li, Runzhong Li, Haotian Liu, Kyriakos Mouratidis, Man Lung Yiu, Huan Li, Qiaomu Shen, Rui Mao, Bo Tang
cs.AI
摘要
AlayaDB 是一款前沿的向量数据库系统,专为 AlayaDB AI 中的大型语言模型(LLMs)高效且有效的长上下文推理而原生设计。具体而言,它将键值缓存(KV cache)和注意力计算从 LLM 推理系统中解耦,并将其封装为一个创新的向量数据库系统。对于模型即服务(MaaS)提供商而言,与现有替代方案(如键值缓存分离、基于检索的稀疏注意力)相比,AlayaDB 在消耗更少硬件资源的同时,为不同服务级别目标(SLOs)的各类工作负载提供了更高的生成质量。AlayaDB 的核心在于它将 LLM 推理中的注意力计算和缓存管理抽象为查询处理流程,并通过原生查询优化器优化性能。在本研究中,我们通过(i)来自行业合作伙伴的三个用例,以及(ii)在 LLM 推理基准上的广泛实验结果,展示了 AlayaDB 的有效性。
English
AlayaDB is a cutting-edge vector database system natively architected for
efficient and effective long-context inference for Large Language Models (LLMs)
at AlayaDB AI. Specifically, it decouples the KV cache and attention
computation from the LLM inference systems, and encapsulates them into a novel
vector database system. For the Model as a Service providers (MaaS), AlayaDB
consumes fewer hardware resources and offers higher generation quality for
various workloads with different kinds of Service Level Objectives (SLOs), when
comparing with the existing alternative solutions (e.g., KV cache
disaggregation, retrieval-based sparse attention). The crux of AlayaDB is that
it abstracts the attention computation and cache management for LLM inference
into a query processing procedure, and optimizes the performance via a native
query optimizer. In this work, we demonstrate the effectiveness of AlayaDB via
(i) three use cases from our industry partners, and (ii) extensive experimental
results on LLM inference benchmarks.Summary
AI-Generated Summary