대규모 언어 모델의 능력을 탐구하여 지식 강화 프롬프팅을 통해 비례 유추 문제를 해결하는 능력

초록

유추를 하는 것은 인지에 있어 근본적입니다. 네 개 용어로 이루어진 비례 유추는 언어 및 인지 능력을 평가하는 데 자주 사용됩니다. 예를 들어 "산소는 가스와 같은 관계에서 <빈칸>는 <빈칸>와 같은 관계에서"와 같은 유추를 완성하는 것은 첫 번째 용어 쌍("산소"와 "가스") 사이의 의미적 관계(예: "종류")를 식별하고 동일한 관계를 공유하는 두 번째 쌍("알루미늄"과 "금속")을 찾는 것을 요구합니다. 본 연구에서는 비례 유추 완성을 위한 15K 다중 선택 질문 응답(MCQA) 데이터셋을 소개하고 현대 대형 언어 모델(LLMs)의 성능을 다양한 지식 강화 프롬프트 설정에서 평가합니다. 구체적으로, 우리는 프롬프트를 예시, 구조화된 지식 및 특정 지식 세 가지 유형의 지식으로 보강합니다. 결과는 현재의 LLMs에게 비례 유추를 해결하는 것이 여전히 어렵다는 것을 보여주며, 최고 모델이 55%의 정확도를 달성했습니다. 특히, 특정 지식을 제공하는 것이 예시나 구조화된 지식 모음을 제공하는 것보다 모델이 비례 유추를 완성하는 데 더 잘 도와줄 수 있다는 것을 발견했습니다.

English

Making analogies is fundamental to cognition. Proportional analogies, which consist of four terms, are often used to assess linguistic and cognitive abilities. For instance, completing analogies like "Oxygen is to Gas as <blank> is to <blank>" requires identifying the semantic relationship (e.g., "type of") between the first pair of terms ("Oxygen" and "Gas") and finding a second pair that shares the same relationship (e.g., "Aluminum" and "Metal"). In this work, we introduce a 15K Multiple-Choice Question Answering (MCQA) dataset for proportional analogy completion and evaluate the performance of contemporary Large Language Models (LLMs) in various knowledge-enhanced prompt settings. Specifically, we augment prompts with three types of knowledge: exemplar, structured, and targeted. Our results show that despite extensive training data, solving proportional analogies remains challenging for current LLMs, with the best model achieving an accuracy of 55%. Notably, we find that providing targeted knowledge can better assist models in completing proportional analogies compared to providing exemplars or collections of structured knowledge.

대규모 언어 모델의 능력을 탐구하여 지식 강화 프롬프팅을 통해 비례 유추 문제를 해결하는 능력

Exploring the Abilities of Large Language Models to Solve Proportional Analogies via Knowledge-Enhanced Prompting

초록

Summary

Support