ChatPaper.aiChatPaper

词形至关重要:大语言模型在乱序拼写下的语义重构

Word Form Matters: LLMs' Semantic Reconstruction under Typoglycemia

March 3, 2025
作者: Chenxi Wang, Tianle Gu, Zhongyu Wei, Lang Gao, Zirui Song, Xiuying Chen
cs.AI

摘要

人类读者能够高效理解乱序单词,这一现象被称为Typoglycemia(乱序阅读效应),主要依赖于单词形态;若仅凭单词形态不足以理解,他们还会进一步利用上下文线索进行解读。尽管先进的大型语言模型(LLMs)也展现出类似能力,但其内在机制尚不明确。为探究此问题,我们通过控制实验分析了单词形态与上下文信息在语义重建中的作用,并考察了LLM的注意力模式。具体而言,我们首先提出了SemRecScore,一种量化语义重建程度的可靠指标,并验证了其有效性。运用该指标,我们研究了单词形态和上下文信息如何影响LLMs的语义重建能力,发现单词形态是这一过程中的核心要素。此外,我们分析了LLMs如何利用单词形态,发现它们依赖特定的注意力头来提取和处理单词形态信息,且这一机制在不同程度的单词乱序下保持稳定。LLMs主要聚焦于单词形态的固定注意力模式与人类读者在平衡单词形态与上下文信息时的自适应策略之间的差异,为通过融入类人的、上下文感知机制来提升LLM性能提供了洞见。
English
Human readers can efficiently comprehend scrambled words, a phenomenon known as Typoglycemia, primarily by relying on word form; if word form alone is insufficient, they further utilize contextual cues for interpretation. While advanced large language models (LLMs) exhibit similar abilities, the underlying mechanisms remain unclear. To investigate this, we conduct controlled experiments to analyze the roles of word form and contextual information in semantic reconstruction and examine LLM attention patterns. Specifically, we first propose SemRecScore, a reliable metric to quantify the degree of semantic reconstruction, and validate its effectiveness. Using this metric, we study how word form and contextual information influence LLMs' semantic reconstruction ability, identifying word form as the core factor in this process. Furthermore, we analyze how LLMs utilize word form and find that they rely on specialized attention heads to extract and process word form information, with this mechanism remaining stable across varying levels of word scrambling. This distinction between LLMs' fixed attention patterns primarily focused on word form and human readers' adaptive strategy in balancing word form and contextual information provides insights into enhancing LLM performance by incorporating human-like, context-aware mechanisms.

Summary

AI-Generated Summary

PDF52March 4, 2025