FoNE：基于傅里叶特征的单令牌数值嵌入精准表示

摘要

大型语言模型（LLMs）通常使用多个标记来表示数字，这要求模型聚合这些标记以解释数值。这种分散性使得训练和推理效率降低，并对模型在数字相关任务上的表现产生不利影响。受预训练LLMs内部学习数字标记的类傅里叶特征这一观察的启发，我们提出了傅里叶数字嵌入（FoNE），这是一种直接将数字映射到嵌入空间并利用其傅里叶特征的新方法。FoNE将每个数字编码为仅需每位数两个嵌入维度的单一标记，有效捕捉数值而无需分散表示。这种紧凑的表达方式加速了训练和推理过程。与传统的子词和逐位嵌入相比，FoNE不仅减少了计算开销，还在包括加法、减法和乘法在内的多种数值任务中实现了更高的准确率。在6位十进制加法任务中，FoNE达到99%准确率所需的数据量比子词和逐位嵌入少64倍，同时每个数字使用的标记数分别减少了3倍和6倍。此外，FoNE是唯一在超过10万个测试样本上对加法、减法和乘法实现100%准确率的方法。代码和可视化内容可在https://fouriernumber.github.io/获取。

English

Large Language Models (LLMs) typically represent numbers using multiple tokens, which requires the model to aggregate these tokens to interpret numerical values. This fragmentation makes both training and inference less efficient and adversely affects the model's performance on number-related tasks. Inspired by the observation that pre-trained LLMs internally learn Fourier-like features for number tokens, we propose Fourier Number Embedding (FoNE), a novel method that directly maps numbers into the embedding space with their Fourier features. FoNE encodes each number as a single token with only two embedding dimensions per digit, effectively capturing numerical values without fragmentation. This compact representation accelerates both training and inference. Compared to traditional subword and digit-wise embeddings, FoNE not only reduces computational overhead but also achieves higher accuracy across various numerical tasks including addition, subtraction and multiplication. On 6-digit decimal addition, FoNE requires 64times less data to achieve 99% accuracy than subword and digit-wise embeddings while using 3times and 6times fewer tokens per number, respectively. Furthermore, FoNE is the only method that yields 100% accuracy on over 100,000 test examples for addition, subtraction, and multiplication. The codes and visualization are available at https://fouriernumber.github.io/.

FoNE：基于傅里叶特征的单令牌数值嵌入精准表示

FoNE: Precise Single-Token Number Embeddings via Fourier Features

摘要

Summary

Support