ChroKnowledge: 다중 도메인에서 언어 모델의 연대 지식 공개

초록

대형 언어 모델(LLMs)은 우리 삶의 여러 측면에 상당한 영향을 미쳤습니다. 그러나 그들의 연대적 지식을 평가하고 보증하는 것은 여전히 어려운 문제입니다. 기존 방법들은 지식의 누적적 성질을 다루는 데 한계가 있어 종종 단일 시간 스탬프에 의존합니다. 이를 극복하기 위해 우리는 ChroKnowBench를 소개합니다. 이는 다중 도메인, 시간 의존성, 시간적 상태를 통해 연대적으로 축적된 지식을 평가하기 위해 설계된 벤치마크 데이터셋입니다. 우리의 벤치마크는 지식이 발전하는 부분(예: 과학적 발견, 개정된 법률)과 지식이 일정한 부분(예: 수학적 사실, 상식적 사실)을 구별합니다. 이 벤치마크를 기반으로, 우리는 ChroKnowledge(지식의 연대적 분류)를 제시합니다. 이는 LLMs의 비모수적 연대적 지식을 평가하고 업데이트하기 위한 샘플링 기반 프레임워크입니다. 우리의 평가 결과는 다음과 같습니다: (1) 시간적 지식을 유도하는 능력은 모델이 훈련된 데이터 형식에 따라 다양합니다. (2) LLMs는 지식의 일부를 부분적으로 회상하거나 시간적 경계에서 잘릴 수 있으며 모든 지식 측면을 올바르게 회상하지 못할 수 있습니다. 따라서 우리는 ChroKnowPrompt를 적용하여 주변 시간 범위를 단계별로 탐색함으로써 연대적 지식을 유도하는 깊이 있는 프롬프팅을 제시합니다. 우리는 우리의 프레임워크가 생물 의학 영역(+11.9%)과 일반 영역(+2.8%) 모두에서 전체 타임라인에 걸쳐 전반적인 지식을 성공적으로 업데이트하는 것을 관찰하며, 시간적 지식을 정제하는 데 효과적임을 입증합니다. 이 비모수적 접근법은 오픈 소스 모델 뿐만 아니라 프로프라이어터리 LLMs에서도 지식 업데이트를 가능하게 하여 모델 유형에 걸쳐 포괄적인 적용 가능성을 보장합니다. 우리는 ChroKnowPrompt의 시간적 특성을 기반으로 포괄적인 분석을 수행하고 우리의 방법을 통해 다양한 모델이 내재적 시간적 지식을 유도할 잠재력을 검증합니다.

English

Large language models (LLMs) have significantly impacted many aspects of our lives. However, assessing and ensuring their chronological knowledge remains challenging. Existing approaches fall short in addressing the accumulative nature of knowledge, often relying on a single time stamp. To overcome this, we introduce ChroKnowBench, a benchmark dataset designed to evaluate chronologically accumulated knowledge across three key aspects: multiple domains, time dependency, temporal state. Our benchmark distinguishes between knowledge that evolves (e.g., scientific discoveries, amended laws) and knowledge that remain constant (e.g., mathematical truths, commonsense facts). Building on this benchmark, we present ChroKnowledge (Chronological Categorization of Knowledge), a novel sampling-based framework for evaluating and updating LLMs' non-parametric chronological knowledge. Our evaluation shows: (1) The ability of eliciting temporal knowledge varies depending on the data format that model was trained on. (2) LLMs partially recall knowledge or show a cut-off at temporal boundaries rather than recalling all aspects of knowledge correctly. Thus, we apply our ChroKnowPrompt, an in-depth prompting to elicit chronological knowledge by traversing step-by-step through the surrounding time spans. We observe that our framework successfully updates the overall knowledge across the entire timeline in both the biomedical domain (+11.9%) and the general domain (+2.8%), demonstrating its effectiveness in refining temporal knowledge. This non-parametric approach also enables knowledge updates not only in open-source models but also in proprietary LLMs, ensuring comprehensive applicability across model types. We perform a comprehensive analysis based on temporal characteristics of ChroKnowPrompt and validate the potential of various models to elicit intrinsic temporal knowledge through our method.

ChroKnowledge: 다중 도메인에서 언어 모델의 연대 지식 공개

ChroKnowledge: Unveiling Chronological Knowledge of Language Models in Multiple Domains

초록

Support