超越下一个标记:自回归视觉生成的下一个X预测
Beyond Next-Token: Next-X Prediction for Autoregressive Visual Generation
February 27, 2025
作者: Sucheng Ren, Qihang Yu, Ju He, Xiaohui Shen, Alan Yuille, Liang-Chieh Chen
cs.AI
摘要
自回归(AR)建模以其下一个标记预测范式而闻名,支撑着最先进的语言和视觉生成模型。传统上,“标记”被视为最小的预测单元,通常是语言中的离散符号或视觉中的量化补丁。然而,对于二维图像结构的最佳标记定义仍然是一个悬而未决的问题。此外,AR模型存在曝光偏差问题,即在训练过程中的教师强制导致推理时的误差累积。在本文中,我们提出了xAR,这是一个推广的AR框架,将标记的概念扩展为一个实体X,可以代表一个单独的补丁标记、一个单元(相邻补丁的k乘k分组)、一个子样本(远距离补丁的非局部分组)、一个尺度(粗到细的分辨率),甚至是整个图像。此外,我们将离散标记分类重新制定为连续实体回归,利用每个AR步骤中的流匹配方法。这种方法使训练依赖于嘈杂实体而不是地面真实标记,从而实现了有效减轻曝光偏差的嘈杂上下文学习。因此,xAR提供了两个关键优势:(1)它实现了灵活的预测单元,捕捉不同的上下文粒度和空间结构,(2)通过避免对教师强制的依赖,减轻了曝光偏差。在ImageNet-256生成基准测试中,我们的基础模型xAR-B(172M)在实现20倍更快推理的同时,胜过了DiT-XL/SiT-XL(675M)。同时,xAR-H以1.24的FID创造了新的最先进水平,在不依赖视觉基础模块(例如DINOv2)或高级引导间隔抽样的情况下,运行速度比以前表现最佳的模型快2.2倍。
English
Autoregressive (AR) modeling, known for its next-token prediction paradigm,
underpins state-of-the-art language and visual generative models.
Traditionally, a ``token'' is treated as the smallest prediction unit, often a
discrete symbol in language or a quantized patch in vision. However, the
optimal token definition for 2D image structures remains an open question.
Moreover, AR models suffer from exposure bias, where teacher forcing during
training leads to error accumulation at inference. In this paper, we propose
xAR, a generalized AR framework that extends the notion of a token to an entity
X, which can represent an individual patch token, a cell (a ktimes k
grouping of neighboring patches), a subsample (a non-local grouping of distant
patches), a scale (coarse-to-fine resolution), or even a whole image.
Additionally, we reformulate discrete token classification as
continuous entity regression, leveraging flow-matching methods at each
AR step. This approach conditions training on noisy entities instead of ground
truth tokens, leading to Noisy Context Learning, which effectively alleviates
exposure bias. As a result, xAR offers two key advantages: (1) it enables
flexible prediction units that capture different contextual granularity and
spatial structures, and (2) it mitigates exposure bias by avoiding reliance on
teacher forcing. On ImageNet-256 generation benchmark, our base model, xAR-B
(172M), outperforms DiT-XL/SiT-XL (675M) while achieving 20times faster
inference. Meanwhile, xAR-H sets a new state-of-the-art with an FID of 1.24,
running 2.2times faster than the previous best-performing model without
relying on vision foundation modules (\eg, DINOv2) or advanced guidance
interval sampling.Summary
AI-Generated Summary
1比特LLM时代:所有大型语言模型均为1.58比特。The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
1比特LLM时代:所有大型语言模型均为1.58比特。
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Shuming Ma, Hongyu Wang, Lingxiao Ma, Lei Wang, Wenhui Wang, Shaohan Huang, Li Dong, Ruiping Wang, Jilong Xue, Furu Wei•Feb 27, 2024•610142
Qwen2.5 技术报告Qwen2.5 Technical Report
Qwen2.5 技术报告
Qwen2.5 Technical Report
Qwen, An Yang, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chengyuan Li, Dayiheng Liu, Fei Huang, Haoran Wei, Huan Lin, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Yang, Jiaxi Yang, Jingren Zhou, Junyang Lin, Kai Dang, Keming Lu, Keqin Bao, Kexin Yang, Le Yu, Mei Li, Mingfeng Xue, Pei Zhang, Qin Zhu, Rui Men, Runji Lin, Tianhao Li, Tingyu Xia, Xingzhang Ren, Xuancheng Ren, Yang Fan, Yang Su, Yichang Zhang, Yu Wan, Yuqiong Liu, Zeyu Cui, Zhenru Zhang, Zihan Qiu•Dec 19, 2024•35211
DeepSeek-R1:通过强化学习激励LLMs中的推理能力DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via
Reinforcement Learning
DeepSeek-R1:通过强化学习激励LLMs中的推理能力
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via
Reinforcement Learning
DeepSeek-AI, Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Ruoyu Zhang, Runxin Xu, Qihao Zhu, Shirong Ma, Peiyi Wang, Xiao Bi, Xiaokang Zhang, Xingkai Yu, Yu Wu, Z. F. Wu, Zhibin Gou, Zhihong Shao, Zhuoshu Li, Ziyi Gao, Aixin Liu, Bing Xue, Bingxuan Wang, Bochao Wu, Bei Feng, Chengda Lu, Chenggang Zhao, Chengqi Deng, Chenyu Zhang, Chong Ruan, Damai Dai, Deli Chen, Dongjie Ji, Erhang Li, Fangyun Lin, Fucong Dai, Fuli Luo, Guangbo Hao, Guanting Chen, Guowei Li, H. Zhang, Han Bao, Hanwei Xu, Haocheng Wang, Honghui Ding, Huajian Xin, Huazuo Gao, Hui Qu, Hui Li, Jianzhong Guo, Jiashi Li, Jiawei Wang, Jingchang Chen, Jingyang Yuan, Junjie Qiu, Junlong Li, J. L. Cai, Jiaqi Ni, Jian Liang, Jin Chen, Kai Dong, Kai Hu, Kaige Gao, Kang Guan, Kexin Huang, Kuai Yu, Lean Wang, Lecong Zhang, Liang Zhao, Litong Wang, Liyue Zhang, Lei Xu, Leyi Xia, Mingchuan Zhang, Minghua Zhang, Minghui Tang, Meng Li, Miaojun Wang, Mingming Li, Ning Tian, Panpan Huang, Peng Zhang, Qiancheng Wang, Qinyu Chen, Qiushi Du, Ruiqi Ge, Ruisong Zhang, Ruizhe Pan, Runji Wang, R. J. Chen, R. L. Jin, Ruyi Chen, Shanghao Lu, Shangyan Zhou, Shanhuang Chen, Shengfeng Ye, Shiyu Wang, Shuiping Yu, Shunfeng Zhou, Shuting Pan, S. S. Li, Shuang Zhou, Shaoqing Wu, Shengfeng Ye, Tao Yun, Tian Pei, Tianyu Sun, T. Wang, Wangding Zeng, Wanjia Zhao, Wen Liu, Wenfeng Liang, Wenjun Gao, Wenqin Yu, Wentao Zhang, W. L. Xiao, Wei An, Xiaodong Liu, Xiaohan Wang, Xiaokang Chen, Xiaotao Nie, Xin Cheng, Xin Liu, Xin Xie, Xingchao Liu, Xinyu Yang, Xinyuan Li, Xuecheng Su, Xuheng Lin, X. Q. Li, Xiangyue Jin, Xiaojin Shen, Xiaosha Chen, Xiaowen Sun, Xiaoxiang Wang, Xinnan Song, Xinyi Zhou, Xianzu Wang, Xinxia Shan, Y. K. Li, Y. Q. Wang, Y. X. Wei, Yang Zhang, Yanhong Xu, Yao Li, Yao Zhao, Yaofeng Sun, Yaohui Wang, Yi Yu, Yichao Zhang, Yifan Shi, Yiliang Xiong, Ying He, Yishi Piao, Yisong Wang, Yixuan Tan, Yiyang Ma, Yiyuan Liu, Yongqiang Guo, Yuan Ou, Yuduan Wang, Yue Gong, Yuheng Zou, Yujia He, Yunfan Xiong, Yuxiang Luo, Yuxiang You, Yuxuan Liu, Yuyang Zhou, Y. X. Zhu, Yanhong Xu, Yanping Huang, Yaohui Li, Yi Zheng, Yuchen Zhu, Yunxian Ma, Ying Tang, Yukun Zha, Yuting Yan, Z. Z. Ren, Zehui Ren, Zhangli Sha, Zhe Fu, Zhean Xu, Zhenda Xie, Zhengyan Zhang, Zhewen Hao, Zhicheng Ma, Zhigang Yan, Zhiyu Wu, Zihui Gu, Zijia Zhu, Zijun Liu, Zilin Li, Ziwei Xie, Ziyang Song, Zizheng Pan, Zhen Huang, Zhipeng Xu, Zhongyu Zhang, Zhen Zhang•Jan 22, 2025•3485