推测性即席查询
Speculative Ad-hoc Querying
March 2, 2025
作者: Haoyu Li, Srikanth Kandula, Maria Angels de Luis Balaguer, Aditya Akella, Venkat Arun
cs.AI
摘要
分析大规模数据集需要响应迅速的查询执行,但在海量数据集上执行SQL查询往往耗时较长。本文探讨了是否可以在用户完成输入前就开始执行查询,从而实现近乎即时的结果呈现。我们提出了SpeQL系统,该系统利用大型语言模型(LLMs),基于数据库模式、用户历史查询及其未完成的查询内容,预测可能的查询。由于精确预测查询不可行,SpeQL通过两种方式对部分查询进行推测:1)预测查询结构,提前编译和规划查询;2)预计算较小的临时表,这些表虽远小于原始数据库,但预计包含回答用户最终查询所需的所有信息。此外,SpeQL实时持续展示推测查询及子查询的结果,助力探索性分析。一项实用性/用户研究表明,SpeQL缩短了任务完成时间,参与者反馈其推测性结果展示帮助他们更快发现数据模式。研究中,SpeQL将用户查询延迟最多降低了289倍,同时将开销控制在每小时4美元的合理范围内。
English
Analyzing large datasets requires responsive query execution, but executing
SQL queries on massive datasets can be slow. This paper explores whether query
execution can begin even before the user has finished typing, allowing results
to appear almost instantly. We propose SpeQL, a system that leverages Large
Language Models (LLMs) to predict likely queries based on the database schema,
the user's past queries, and their incomplete query. Since exact query
prediction is infeasible, SpeQL speculates on partial queries in two ways: 1)
it predicts the query structure to compile and plan queries in advance, and 2)
it precomputes smaller temporary tables that are much smaller than the original
database, but are still predicted to contain all information necessary to
answer the user's final query. Additionally, SpeQL continuously displays
results for speculated queries and subqueries in real time, aiding exploratory
analysis. A utility/user study showed that SpeQL improved task completion time,
and participants reported that its speculative display of results helped them
discover patterns in the data more quickly. In the study, SpeQL improves user's
query latency by up to 289times and kept the overhead reasonable, at 4$
per hour.Summary
AI-Generated Summary