大型語言模型的計數能力及標記化的影響
Counting Ability of Large Language Models and Impact of Tokenization
October 25, 2024
作者: Xiang Zhang, Juntai Cao, Chenyu You
cs.AI
摘要
Transformer是現代大型語言模型(LLM)的基礎,但面臨著固有的架構限制,影響其推理能力。與循環網絡不同,Transformer缺乏循環連接,使其受限於恆定深度計算。這種限制將其歸入複雜度類別TC^0,理論上無法解決隨輸入長度增加而需要越來越深層推理的任務。計數是許多推理任務的基本組成部分,也需要推理深度隨著線性增長以進行歸納。先前的研究確定了基於Transformer的專家模型(即專門為計數任務訓練的模型)計數能力的上限,但這些發現無法直接擴展到通用型LLM,因為推理機制存在差異。最近的研究突顯了“Chain of Thought”(CoT)推理如何幫助緩解Transformer在計數任務中的一些架構限制。然而,對這些模型中標記化的作用卻鮮有研究。與通常使用字符級標記化的專家模型不同,LLM通常依賴字節級(BPE)標記器,從根本上改變了推理處理方式。我們的研究探討了標記化對LLM計數能力的影響,發現基於輸入標記化差異的實質性性能變化。我們提供理論和實驗分析,深入探討標記化選擇如何削弱模型的理論可計算性,從而激發設計新的標記化方法以增強LLM中的推理能力。
English
Transformers, the backbone of modern large language models (LLMs), face
inherent architectural limitations that impede their reasoning capabilities.
Unlike recurrent networks, Transformers lack recurrent connections, confining
them to constant-depth computation. This restriction places them in the
complexity class TC^0, making them theoretically incapable of solving tasks
that demand increasingly deep reasoning as input length grows. Counting, a
fundamental component of many reasoning tasks, also requires reasoning depth to
grow linearly to be performed inductively. While previous studies have
established the upper limits of counting ability in Transformer-based expert
models (i.e., models specifically trained for counting tasks), these findings
do not directly extend to general-purpose LLMs due to differences in reasoning
mechanisms. Recent work has highlighted how Chain of Thought (CoT) reasoning
can help alleviate some of the architectural limitations of Transformers in
counting tasks. However, little attention has been paid to the role of
tokenization in these models. Unlike expert models that often use
character-level tokenization, LLMs typically rely on byte-level (BPE)
tokenizers, which fundamentally alters the way reasoning is processed. Our work
investigates the impact of tokenization on the counting abilities of LLMs,
uncovering substantial performance variations based on input tokenization
differences. We provide both theoretical and experimental analyses, offering
insights into how tokenization choices can undermine models' theoretical
computability, thereby inspiring the design of new tokenization methods to
enhance reasoning in LLMs.Summary
AI-Generated Summary