1. 5.8ビットのFLUX

要旨

1.58 ビット FLUX を提案します。これは、最先端のテキストから画像を生成するモデルである FLUX.1-dev を 1.58 ビットの重み（つまり、{-1, 0, +1} の値）を使用して量子化する初めての成功したアプローチです。このアプローチは、1024 x 1024 の画像を生成する際に、同等の性能を維持します。特筆すべきは、当該の量子化手法は画像データにアクセスせず、FLUX.1-dev モデルからの自己監督に完全に依存して動作します。さらに、1.58 ビットの演算に最適化されたカスタムカーネルを開発し、モデルのストレージを 7.7 倍、推論メモリを 5.1 倍削減し、推論レイテンシを改善します。GenEval および T2I Compbench ベンチマークでの包括的な評価は、1.58 ビット FLUX が生成品質を維持しながら、計算効率を著しく向上させる効果を示しています。

English

We present 1.58-bit FLUX, the first successful approach to quantizing the state-of-the-art text-to-image generation model, FLUX.1-dev, using 1.58-bit weights (i.e., values in {-1, 0, +1}) while maintaining comparable performance for generating 1024 x 1024 images. Notably, our quantization method operates without access to image data, relying solely on self-supervision from the FLUX.1-dev model. Additionally, we develop a custom kernel optimized for 1.58-bit operations, achieving a 7.7x reduction in model storage, a 5.1x reduction in inference memory, and improved inference latency. Extensive evaluations on the GenEval and T2I Compbench benchmarks demonstrate the effectiveness of 1.58-bit FLUX in maintaining generation quality while significantly enhancing computational efficiency.