並列化された自己回帰型ビジュアル生成

要旨

自己回帰モデルは視覚生成において強力な手法として登場していますが、その逐次的なトークンごとの予測プロセスにより推論速度が遅いという課題があります。本論文では、自己回帰型視覚生成を並列化するためのシンプルかつ効果的なアプローチを提案します。このアプローチにより、生成効率を向上させつつ、自己回帰モデリングの利点を保持します。私たちの主要な洞察は、並列生成が視覚トークンの依存関係に依存するという点です。つまり、依存関係が弱いトークンは並列で生成できますが、強く依存する隣接トークンは一緒に生成するのが難しく、独立したサンプリングが不整合を引き起こす可能性があります。この観察に基づき、弱い依存関係を持つ遠隔トークンを並列で生成し、強く依存する局所トークンについては逐次生成を維持する並列生成戦略を開発します。このアプローチは、標準の自己回帰モデルにシームレスに統合でき、アーキテクチャやトークナイザーを変更する必要がありません。ImageNetとUCF-101での実験結果は、当社の手法が画像およびビデオ生成タスクの両方で、同等の品質を維持しつつ、3.6倍の高速化を達成し、品質の低下を最小限に抑えて最大9.5倍の高速化を実現することを示しています。この研究が効率的な視覚生成と統一された自己回帰モデリングにおける将来の研究にインスピレーションを与えることを願っています。プロジェクトページ: https://epiphqny.github.io/PAR-project.

English

Autoregressive models have emerged as a powerful approach for visual generation but suffer from slow inference speed due to their sequential token-by-token prediction process. In this paper, we propose a simple yet effective approach for parallelized autoregressive visual generation that improves generation efficiency while preserving the advantages of autoregressive modeling. Our key insight is that parallel generation depends on visual token dependencies-tokens with weak dependencies can be generated in parallel, while strongly dependent adjacent tokens are difficult to generate together, as their independent sampling may lead to inconsistencies. Based on this observation, we develop a parallel generation strategy that generates distant tokens with weak dependencies in parallel while maintaining sequential generation for strongly dependent local tokens. Our approach can be seamlessly integrated into standard autoregressive models without modifying the architecture or tokenizer. Experiments on ImageNet and UCF-101 demonstrate that our method achieves a 3.6x speedup with comparable quality and up to 9.5x speedup with minimal quality degradation across both image and video generation tasks. We hope this work will inspire future research in efficient visual generation and unified autoregressive modeling. Project page: https://epiphqny.github.io/PAR-project.

並列化された自己回帰型ビジュアル生成

Parallelized Autoregressive Visual Generation

要旨

Summary

Support

Support