压缩学习的学习压缩

Learned Compression for Compressed Learning

December 12, 2024
作者: Dan Jacobellis, Neeraja J. Yadwadkar
cs.AI

摘要

现代传感器产生越来越丰富的高分辨率数据流。由于资源限制,机器学习系统通过降低分辨率丢弃了绝大部分这些信息。压缩域学习使模型能够在紧凑的潜在表示上运行,从而在相同预算下实现更高的有效分辨率。然而,现有的压缩系统并非理想的压缩学习工具。线性变换编码和端到端学习的压缩系统可以减少比特率,但并未均匀降低维度;因此,它们并未实质性地提高效率。生成式自编码器可以降低维度,但其对抗性或感知目标会导致重要信息的丢失。为了解决这些限制,我们引入了WaLLoC(Wavelet Learned Lossy Compression),这是一种神经编解码器架构,结合了线性变换编码和非线性降维自编码器。WaLLoC在可逆小波包变换之间夹入了一个浅层、非对称的自编码器和熵瓶颈。在几个关键指标上,WaLLoC优于最先进的潜在扩散模型中使用的自编码器。WaLLoC不需要感知或对抗性损失来表示高频细节,可与RGB图像和立体声音频之外的模态兼容。WaLLoC的编码器几乎完全由线性操作组成,使其异常高效且适用于移动计算、远程感知以及直接从压缩数据中学习。我们展示了WaLLoC在多个任务中的压缩域学习能力,包括图像分类、着色、文档理解和音乐源分离。我们的代码、实验以及预训练音频和图像编解码器可在https://ut-sysml.org/walloc获取。
English
Modern sensors produce increasingly rich streams of high-resolution data. Due to resource constraints, machine learning systems discard the vast majority of this information via resolution reduction. Compressed-domain learning allows models to operate on compact latent representations, allowing higher effective resolution for the same budget. However, existing compression systems are not ideal for compressed learning. Linear transform coding and end-to-end learned compression systems reduce bitrate, but do not uniformly reduce dimensionality; thus, they do not meaningfully increase efficiency. Generative autoencoders reduce dimensionality, but their adversarial or perceptual objectives lead to significant information loss. To address these limitations, we introduce WaLLoC (Wavelet Learned Lossy Compression), a neural codec architecture that combines linear transform coding with nonlinear dimensionality-reducing autoencoders. WaLLoC sandwiches a shallow, asymmetric autoencoder and entropy bottleneck between an invertible wavelet packet transform. Across several key metrics, WaLLoC outperforms the autoencoders used in state-of-the-art latent diffusion models. WaLLoC does not require perceptual or adversarial losses to represent high-frequency detail, providing compatibility with modalities beyond RGB images and stereo audio. WaLLoC's encoder consists almost entirely of linear operations, making it exceptionally efficient and suitable for mobile computing, remote sensing, and learning directly from compressed data. We demonstrate WaLLoC's capability for compressed-domain learning across several tasks, including image classification, colorization, document understanding, and music source separation. Our code, experiments, and pre-trained audio and image codecs are available at https://ut-sysml.org/walloc

Summary

AI-Generated Summary

PDF122December 13, 2024