토큰 혼합: 향상된 언어 모델 추론을 위한 잠재 및 텍스트 토큰 혼합

초록

대형 언어 모델 (LLM)은 chain-of-thought (CoT) 데이터로 훈련될 때 추론 및 계획에 뛰어납니다. 여기서는 단계별 사고 과정이 텍스트 토큰에 의해 명시적으로 개요되어 있습니다. 그러나 이로 인해 핵심 추론 정보보다는 텍스트 일관성을 지원하는 많은 단어가 포함된 긴 입력이 발생하며, 이러한 입력을 처리하는 데 상당한 계산 자원이 소비됩니다. 본 연구에서는 추론 과정의 하이브리드 표현을 제안합니다. 여기서는 VQ-VAE에 의해 생성된 잠재 이산 토큰을 사용하여 초기 추론 단계를 일부 추상화하여 추론 트레이스의 길이를 크게 줄입니다. 우리는 잠재 추적 추상화의 사용을 두 가지 시나리오에서 탐구합니다: 1) Keys-Finding Maze 문제에 대해 모델을 처음부터 훈련하는 것, 2) 논리 및 수학적 추론 문제에 대해 보이지 않는 잠재 토큰을 포함한 확장된 어휘로 이 하이브리드 데이터에서 LLM을 세밀하게 조정하는 것. 효과적인 학습을 돕기 위해 우리는 잠재 및 텍스트 토큰을 무작위로 섞는 간단한 훈련 절차를 소개합니다. 이는 새로운 잠재 토큰에 빠르게 적응할 수 있도록 합니다. 우리의 접근 방식은 다양한 벤치마크에서 기존 방법보다 일관되게 우수한 성능을 보입니다.

English

Large Language Models (LLMs) excel at reasoning and planning when trained on chainof-thought (CoT) data, where the step-by-step thought process is explicitly outlined by text tokens. However, this results in lengthy inputs where many words support textual coherence rather than core reasoning information, and processing these inputs consumes substantial computation resources. In this work, we propose a hybrid representation of the reasoning process, where we partially abstract away the initial reasoning steps using latent discrete tokens generated by VQ-VAE, significantly reducing the length of reasoning traces. We explore the use of latent trace abstractions in two scenarios: 1) training the model from scratch for the Keys-Finding Maze problem, 2) fine-tuning LLMs on this hybrid data with an extended vocabulary including unseen latent tokens, for both logical and mathematical reasoning problems. To facilitate effective learning, we introduce a simple training procedure that randomly mixes latent and text tokens, which enables fast adaptation to new latent tokens. Our approach consistently outperforms the baselines methods in various benchmarks.

토큰 혼합: 향상된 언어 모델 추론을 위한 잠재 및 텍스트 토큰 혼합

Token Assorted: Mixing Latent and Text Tokens for Improved Language Model Reasoning

초록

Support