ChatPaper.aiChatPaper

自监督量化表示:实现知识图谱与大型语言模型的无缝集成

Self-supervised Quantized Representation for Seamlessly Integrating Knowledge Graphs with Large Language Models

January 30, 2025
作者: Qika Lin, Tianzhe Zhao, Kai He, Zhen Peng, Fangzhi Xu, Ling Huang, Jingying Ma, Mengling Feng
cs.AI

摘要

由于知识图谱(KG)结构与自然语言之间存在天然差距,将KG的整体结构信息有效地与大型语言模型(LLMs)进行整合已成为一个重要问题。为此,我们提出了一个两阶段框架,用于学习和应用每个实体的量化编码,旨在实现KG与LLMs的无缝整合。首先,提出了一种自监督量化表示(SSQR)方法,将KG的结构和语义知识压缩为离散代码(即,标记),以使其与语言句子的格式对齐。我们进一步设计了KG指令跟随数据,将这些学习到的代码视为特征直接输入LLMs,从而实现无缝整合。实验结果表明,SSQR优于现有的无监督量化方法,产生更具区分性的代码。此外,经过微调的LLaMA2和LLaMA3.1在KG链接预测和三元分类任务上也表现出色,仅利用每个实体16个标记,而不是传统提示方法中的数千个。
English
Due to the presence of the natural gap between Knowledge Graph (KG) structures and the natural language, the effective integration of holistic structural information of KGs with Large Language Models (LLMs) has emerged as a significant question. To this end, we propose a two-stage framework to learn and apply quantized codes for each entity, aiming for the seamless integration of KGs with LLMs. Firstly, a self-supervised quantized representation (SSQR) method is proposed to compress both KG structural and semantic knowledge into discrete codes (\ie, tokens) that align the format of language sentences. We further design KG instruction-following data by viewing these learned codes as features to directly input to LLMs, thereby achieving seamless integration. The experiment results demonstrate that SSQR outperforms existing unsupervised quantized methods, producing more distinguishable codes. Further, the fine-tuned LLaMA2 and LLaMA3.1 also have superior performance on KG link prediction and triple classification tasks, utilizing only 16 tokens per entity instead of thousands in conventional prompting methods.

Summary

AI-Generated Summary

PDF253February 3, 2025