대형 언어 모델의 계산 능력과 토큰화의 영향

초록

현대 대형 언어 모델(Large Language Models, LLMs)의 중추인 트랜스포머는 추론 능력을 방해하는 고유한 구조적 한계에 직면합니다. 순환 신경망과 달리 트랜스포머는 순환 연결이 없어서 일정 깊이의 계산에 국한됩니다. 이 제한으로 인해 트랜스포머는 복잡도 클래스 TC^0에 속하게 되어 입력 길이가 증가함에 따라 깊은 추론이 필요한 작업을 이론적으로 해결할 수 없게 됩니다. 많은 추론 작업의 기본 구성 요소인 계산은 귀납적으로 수행하기 위해 추론 깊이가 선형적으로 증가해야 합니다. 이전 연구는 트랜스포머 기반 전문가 모델(즉, 계산 작업에 특별히 훈련된 모델)의 계산 능력 상한을 확립했지만, 이러한 결과는 추론 메커니즘의 차이로 인해 일반 목적 LLMs로 직접 확장되지 않습니다. 최근 연구는 체인 오브 쓰오트(Chain of Thought, CoT) 추론이 계산 작업에서 트랜스포머의 구조적 한계를 완화하는 데 도움이 될 수 있다는 점을 강조했습니다. 그러나 이러한 모델에서 토큰화의 역할에 대한 주목이 부족합니다. 전문가 모델이 종종 문자 수준의 토큰화를 사용하는 반면, LLMs는 일반적으로 바이트 수준(Byte Pair Encoding, BPE) 토크나이저를 사용하여 추론이 처리되는 방식을 근본적으로 변경합니다. 저희 연구는 LLMs의 계산 능력에 미치는 토큰화의 영향을 조사하여 입력 토큰화의 차이에 따라 상당한 성능 변동을 발견했습니다. 이론적 및 실험적 분석을 제공하여 토큰화 선택이 모델의 이론적 계산 가능성을 저해할 수 있는 방법에 대한 통찰을 제공하며, LLMs의 추론을 향상시키기 위한 새로운 토큰화 방법을 설계하는 데 영감을 줍니다.

English

Transformers, the backbone of modern large language models (LLMs), face inherent architectural limitations that impede their reasoning capabilities. Unlike recurrent networks, Transformers lack recurrent connections, confining them to constant-depth computation. This restriction places them in the complexity class TC^0, making them theoretically incapable of solving tasks that demand increasingly deep reasoning as input length grows. Counting, a fundamental component of many reasoning tasks, also requires reasoning depth to grow linearly to be performed inductively. While previous studies have established the upper limits of counting ability in Transformer-based expert models (i.e., models specifically trained for counting tasks), these findings do not directly extend to general-purpose LLMs due to differences in reasoning mechanisms. Recent work has highlighted how Chain of Thought (CoT) reasoning can help alleviate some of the architectural limitations of Transformers in counting tasks. However, little attention has been paid to the role of tokenization in these models. Unlike expert models that often use character-level tokenization, LLMs typically rely on byte-level (BPE) tokenizers, which fundamentally alters the way reasoning is processed. Our work investigates the impact of tokenization on the counting abilities of LLMs, uncovering substantial performance variations based on input tokenization differences. We provide both theoretical and experimental analyses, offering insights into how tokenization choices can undermine models' theoretical computability, thereby inspiring the design of new tokenization methods to enhance reasoning in LLMs.

대형 언어 모델의 계산 능력과 토큰화의 영향

Counting Ability of Large Language Models and Impact of Tokenization

초록

Support