SQL-R1：通过强化学习训练自然语言到SQL推理模型

摘要

自然语言转SQL（NL2SQL）通过将自然语言查询转化为结构化的SQL语句，实现了与数据库的直观交互。尽管近年来在增强数据库应用中的人机交互方面取得了进展，但在涉及多表连接和嵌套查询的复杂场景下，推理性能仍面临重大挑战。当前方法主要采用监督微调（SFT）来训练NL2SQL模型，这在新环境（如金融和医疗）中可能限制其适应性和可解释性。为了提升NL2SQL模型在上述复杂情况下的推理性能，我们引入了SQL-R1，一种通过强化学习（RL）算法训练的新型NL2SQL推理模型。我们设计了一种专为NL2SQL任务定制的基于RL的奖励函数，并探讨了冷启动对强化训练效果的影响。此外，我们仅使用少量合成的NL2SQL数据进行增强训练，便达到了具有竞争力的准确率，并进一步探索了RL中的数据工程。在现有实验中，SQL-R1在基准测试Spider和BIRD上分别实现了88.6%和66.6%的执行准确率，仅使用了7B的基础模型。

English

Natural Language to SQL (NL2SQL) enables intuitive interactions with databases by transforming natural language queries into structured SQL statements. Despite recent advancements in enhancing human-computer interaction within database applications, significant challenges persist, particularly regarding the inference performance in complex scenarios involving multi-table joins and nested queries. Current methodologies primarily utilize supervised fine-tuning (SFT) to train the NL2SQL model, which may limit adaptability and interpretability in new environments (e.g., finance and healthcare). In order to enhance the reasoning performance of the NL2SQL model in the above complex situations, we introduce SQL-R1, a novel NL2SQL reasoning model trained by the reinforcement learning (RL) algorithms. We design a specialized RL-based reward function tailored for NL2SQL tasks and discussed the impact of cold start on the effectiveness of intensive training. In addition, we achieve competitive accuracy using only a tiny amount of synthetic NL2SQL data for augmented training and further explore data engineering for RL. In existing experiments, SQL-R1 achieves execution accuracy of 88.6% and 66.6% on the benchmark Spider and BIRD, respectively, only using the 7B base model.

SQL-R1：通过强化学习训练自然语言到SQL推理模型

SQL-R1: Training Natural Language to SQL Reasoning Model By Reinforcement Learning

摘要

Summary

Support

Support