ChatPaper.aiChatPaper

跨朝代时序推理与对齐能力基准测试

Benchmarking Temporal Reasoning and Alignment Across Chinese Dynasties

February 24, 2025
作者: Zhenglin Wang, Jialong Wu, Pengfei LI, Yong Jiang, Deyu Zhou
cs.AI

摘要

时序推理是人类认知的基础,对众多现实应用至关重要。尽管大型语言模型(LLMs)在时序推理方面展现了令人瞩目的能力,现有基准测试主要依赖规则构建,缺乏情境深度,且涉及的时序实体范围有限。为克服这些局限,我们推出了“中国时间推理”(CTM)基准,旨在评估LLMs在中国朝代编年史广泛背景下的时序推理能力。CTM强调跨实体关系、成对时序对齐以及情境化与文化根基的推理,提供了全面的评估框架。大量实验结果揭示了CTM带来的挑战,并指出了潜在的改进方向。
English
Temporal reasoning is fundamental to human cognition and is crucial for various real-world applications. While recent advances in Large Language Models have demonstrated promising capabilities in temporal reasoning, existing benchmarks primarily rely on rule-based construction, lack contextual depth, and involve a limited range of temporal entities. To address these limitations, we introduce Chinese Time Reasoning (CTM), a benchmark designed to evaluate LLMs on temporal reasoning within the extensive scope of Chinese dynastic chronology. CTM emphasizes cross-entity relationships, pairwise temporal alignment, and contextualized and culturally-grounded reasoning, providing a comprehensive evaluation. Extensive experimental results reveal the challenges posed by CTM and highlight potential avenues for improvement.

Summary

AI-Generated Summary

PDF74February 25, 2025