通过知识增强提示探索大型语言模型解决比例类比的能力
Exploring the Abilities of Large Language Models to Solve Proportional Analogies via Knowledge-Enhanced Prompting
December 1, 2024
作者: Thilini Wijesiriwardene, Ruwan Wickramarachchi, Sreeram Vennam, Vinija Jain, Aman Chadha, Amitava Das, Ponnurangam Kumaraguru, Amit Sheth
cs.AI
摘要
制作类比是认知的基础。比例类比由四个术语组成,通常用于评估语言和认知能力。例如,完成类比如“氧气对应气体,<空白>对应<空白>”需要识别第一对术语(“氧气”和“气体”)之间的语义关系(例如,“类型为”),并找到共享相同关系的第二对术语(例如,“铝”和“金属”)。在这项工作中,我们介绍了一个包含15K个多项选择问题答案(MCQA)的数据集,用于比例类比完成,并评估当代大型语言模型(LLMs)在各种知识增强提示设置下的表现。具体而言,我们使用三种类型的知识来增强提示:示例、结构化和有针对性的。我们的结果显示,尽管有大量训练数据,但对于当前的LLMs来说,解决比例类比仍然具有挑战性,最佳模型的准确率为55%。值得注意的是,我们发现提供有针对性的知识可以更好地帮助模型完成比例类比,而不是提供示例或结构化知识集合。
English
Making analogies is fundamental to cognition. Proportional analogies, which
consist of four terms, are often used to assess linguistic and cognitive
abilities. For instance, completing analogies like "Oxygen is to Gas as <blank>
is to <blank>" requires identifying the semantic relationship (e.g., "type of")
between the first pair of terms ("Oxygen" and "Gas") and finding a second pair
that shares the same relationship (e.g., "Aluminum" and "Metal"). In this work,
we introduce a 15K Multiple-Choice Question Answering (MCQA) dataset for
proportional analogy completion and evaluate the performance of contemporary
Large Language Models (LLMs) in various knowledge-enhanced prompt settings.
Specifically, we augment prompts with three types of knowledge: exemplar,
structured, and targeted. Our results show that despite extensive training
data, solving proportional analogies remains challenging for current LLMs, with
the best model achieving an accuracy of 55%. Notably, we find that providing
targeted knowledge can better assist models in completing proportional
analogies compared to providing exemplars or collections of structured
knowledge.Summary
AI-Generated Summary