通过生成后记忆检索减少基于语言模型的SPARQL查询生成中的幻觉现象

摘要

从自然语言问题生成SPARQL查询的能力，对于确保从知识图谱（KG）中高效准确地检索结构化数据至关重要。尽管大型语言模型（LLMs）在SPARQL查询生成中已被广泛采用，但在基于内部参数知识生成如统一资源标识符（URIs）等KG元素时，它们往往容易产生幻觉和分布外错误。这通常导致生成的内容看似合理实则事实错误，为其在现实世界信息检索（IR）应用中的使用带来了重大挑战。因此，针对此类错误的检测与缓解研究日益增多。本文提出了PGMR（后生成记忆检索），一个模块化框架，它引入了一个非参数记忆模块来检索KG元素，从而增强基于LLM的SPARQL查询生成。实验结果表明，PGMR在多种数据集、数据分布及LLMs上均展现出稳定的优异性能。尤为突出的是，PGMR显著减少了URI幻觉问题，在多个场景下几乎完全消除了这一现象。

English

The ability to generate SPARQL queries from natural language questions is crucial for ensuring efficient and accurate retrieval of structured data from knowledge graphs (KG). While large language models (LLMs) have been widely adopted for SPARQL query generation, they are often susceptible to hallucinations and out-of-distribution errors when producing KG elements like Uniform Resource Identifiers (URIs) based on internal parametric knowledge. This often results in content that appears plausible but is factually incorrect, posing significant challenges for their use in real-world information retrieval (IR) applications. This has led to increased research aimed at detecting and mitigating such errors. In this paper, we introduce PGMR (Post-Generation Memory Retrieval), a modular framework that incorporates a non-parametric memory module to retrieve KG elements and enhance LLM-based SPARQL query generation. Our experimental results indicate that PGMR consistently delivers strong performance across diverse datasets, data distributions, and LLMs. Notably, PGMR significantly mitigates URI hallucinations, nearly eliminating the problem in several scenarios.

通过生成后记忆检索减少基于语言模型的SPARQL查询生成中的幻觉现象

Reducing Hallucinations in Language Model-based SPARQL Query Generation Using Post-Generation Memory Retrieval

摘要

Summary

Support