大型语言模型的计数能力及标记化的影响
Counting Ability of Large Language Models and Impact of Tokenization
October 25, 2024
作者: Xiang Zhang, Juntai Cao, Chenyu You
cs.AI
摘要
现代大型语言模型(LLM)的支柱Transformer面临固有的架构限制,阻碍了其推理能力。与循环网络不同,Transformer缺乏循环连接,使其局限于恒定深度的计算。这种限制将其置于复杂度类TC^0中,从理论上讲,使其无法解决随着输入长度增加而需要越来越深层推理的任务。计数是许多推理任务的基本组成部分,也需要推理深度线性增长才能进行归纳。尽管先前的研究已经确定了基于Transformer的专家模型(即专门针对计数任务进行训练的模型)在计数能力上的上限,但由于推理机制的差异,这些发现并不直接适用于通用的LLM。最近的研究突出了思维链(CoT)推理如何帮助减轻Transformer在计数任务中的一些架构限制。然而,对这些模型中标记化的作用却鲜有关注。与通常使用字符级标记化的专家模型不同,LLM通常依赖于字节级(BPE)分词器,这从根本上改变了推理的处理方式。我们的研究调查了标记化对LLM计数能力的影响,发现基于输入标记化差异的显著性能变化。我们提供了理论和实验分析,深入探讨了标记化选择如何削弱模型的理论可计算性,从而激发设计新的标记化方法以增强LLM中的推理能力。
English
Transformers, the backbone of modern large language models (LLMs), face
inherent architectural limitations that impede their reasoning capabilities.
Unlike recurrent networks, Transformers lack recurrent connections, confining
them to constant-depth computation. This restriction places them in the
complexity class TC^0, making them theoretically incapable of solving tasks
that demand increasingly deep reasoning as input length grows. Counting, a
fundamental component of many reasoning tasks, also requires reasoning depth to
grow linearly to be performed inductively. While previous studies have
established the upper limits of counting ability in Transformer-based expert
models (i.e., models specifically trained for counting tasks), these findings
do not directly extend to general-purpose LLMs due to differences in reasoning
mechanisms. Recent work has highlighted how Chain of Thought (CoT) reasoning
can help alleviate some of the architectural limitations of Transformers in
counting tasks. However, little attention has been paid to the role of
tokenization in these models. Unlike expert models that often use
character-level tokenization, LLMs typically rely on byte-level (BPE)
tokenizers, which fundamentally alters the way reasoning is processed. Our work
investigates the impact of tokenization on the counting abilities of LLMs,
uncovering substantial performance variations based on input tokenization
differences. We provide both theoretical and experimental analyses, offering
insights into how tokenization choices can undermine models' theoretical
computability, thereby inspiring the design of new tokenization methods to
enhance reasoning in LLMs.Summary
AI-Generated Summary