AlayaDB：高效長上下文LLM推理的數據基礎

摘要

AlayaDB 是一款前沿的向量数据库系统，专为 AlayaDB AI 中的大型语言模型（LLMs）高效且有效的长上下文推理而原生设计。具体而言，它将键值缓存（KV cache）和注意力计算从 LLM 推理系统中解耦，并将其封装为一个创新的向量数据库系统。对于模型即服务（MaaS）提供商而言，与现有替代方案（如键值缓存分离、基于检索的稀疏注意力）相比，AlayaDB 在消耗更少硬件资源的同时，为不同服务级别目标（SLOs）的各类工作负载提供了更高的生成质量。AlayaDB 的核心在于它将 LLM 推理中的注意力计算和缓存管理抽象为查询处理流程，并通过原生查询优化器优化性能。在本研究中，我们通过（i）来自行业合作伙伴的三个用例，以及（ii）在 LLM 推理基准上的广泛实验结果，展示了 AlayaDB 的有效性。

English

AlayaDB is a cutting-edge vector database system natively architected for efficient and effective long-context inference for Large Language Models (LLMs) at AlayaDB AI. Specifically, it decouples the KV cache and attention computation from the LLM inference systems, and encapsulates them into a novel vector database system. For the Model as a Service providers (MaaS), AlayaDB consumes fewer hardware resources and offers higher generation quality for various workloads with different kinds of Service Level Objectives (SLOs), when comparing with the existing alternative solutions (e.g., KV cache disaggregation, retrieval-based sparse attention). The crux of AlayaDB is that it abstracts the attention computation and cache management for LLM inference into a query processing procedure, and optimizes the performance via a native query optimizer. In this work, we demonstrate the effectiveness of AlayaDB via (i) three use cases from our industry partners, and (ii) extensive experimental results on LLM inference benchmarks.

AlayaDB：高效長上下文LLM推理的數據基礎

AlayaDB: The Data Foundation for Efficient and Effective Long-context LLM Inference

摘要

Summary

Support

Support