壓縮學習的學習壓縮
Learned Compression for Compressed Learning
December 12, 2024
作者: Dan Jacobellis, Neeraja J. Yadwadkar
cs.AI
摘要
現代感測器產生越來越豐富的高解析度數據流。由於資源限制,機器學習系統通過降低解析度來丟棄大部分這些信息。壓縮領域學習使模型能夠在緊湊的潛在表示上運行,從而在相同預算下實現更高的有效解析度。然而,現有的壓縮系統並不適合於壓縮學習。線性變換編碼和端到端學習的壓縮系統可以降低比特率,但並不均勻地降低維度;因此,它們並不能顯著提高效率。生成式自編碼器可以降低維度,但它們的對抗性或感知目標會導致重大信息損失。為了解決這些限制,我們引入了WaLLoC(Wavelet Learned Lossy Compression),這是一種神經編解碼器架構,結合了線性變換編碼和非線性降維自編碼器。WaLLoC在一個可逆小波包變換之間夾上一個淺層、非對稱自編碼器和熵瓶頸。在幾個關鍵指標上,WaLLoC優於當前最先進的潛在擴散模型中使用的自編碼器。WaLLoC不需要感知或對抗損失來表示高頻細節,可與RGB圖像和立體聲音以外的模態兼容。WaLLoC的編碼器幾乎完全由線性操作組成,使其非常高效,適用於移動計算、遙感和直接從壓縮數據學習。我們展示了WaLLoC在壓縮領域學習中的能力,包括圖像分類、上色、文檔理解和音樂源分離等多個任務。我們的代碼、實驗和預先訓練的音頻和圖像編解碼器可在https://ut-sysml.org/walloc 上找到。
English
Modern sensors produce increasingly rich streams of high-resolution data. Due
to resource constraints, machine learning systems discard the vast majority of
this information via resolution reduction. Compressed-domain learning allows
models to operate on compact latent representations, allowing higher effective
resolution for the same budget. However, existing compression systems are not
ideal for compressed learning. Linear transform coding and end-to-end learned
compression systems reduce bitrate, but do not uniformly reduce dimensionality;
thus, they do not meaningfully increase efficiency. Generative autoencoders
reduce dimensionality, but their adversarial or perceptual objectives lead to
significant information loss. To address these limitations, we introduce WaLLoC
(Wavelet Learned Lossy Compression), a neural codec architecture that combines
linear transform coding with nonlinear dimensionality-reducing autoencoders.
WaLLoC sandwiches a shallow, asymmetric autoencoder and entropy bottleneck
between an invertible wavelet packet transform. Across several key metrics,
WaLLoC outperforms the autoencoders used in state-of-the-art latent diffusion
models. WaLLoC does not require perceptual or adversarial losses to represent
high-frequency detail, providing compatibility with modalities beyond RGB
images and stereo audio. WaLLoC's encoder consists almost entirely of linear
operations, making it exceptionally efficient and suitable for mobile
computing, remote sensing, and learning directly from compressed data. We
demonstrate WaLLoC's capability for compressed-domain learning across several
tasks, including image classification, colorization, document understanding,
and music source separation. Our code, experiments, and pre-trained audio and
image codecs are available at https://ut-sysml.org/wallocSummary
AI-Generated Summary