압축 학습을 위한 학습된 압축

초록

현대 센서는 점점 더 풍부한 고해상도 데이터 스트림을 생성합니다. 자원 제약으로 인해 기계 학습 시스템은 이 정보의 대다수를 해상도 축소를 통해 버립니다. 압축 도메인 학습은 모델이 간결한 잠재 표현상에서 작동할 수 있게 하여 동일한 예산으로 더 높은 유효 해상도를 제공합니다. 그러나 기존의 압축 시스템은 압축 학습에 적합하지 않습니다. 선형 변환 부호화와 end-to-end 학습된 압축 시스템은 비트율을 줄이지만 차원을 균일하게 줄이지 않으므로 효율성을 의미있게 증가시키지 않습니다. 생성적 오토인코더는 차원을 줄이지만 적대적이거나 지각적 목표로 인해 상당한 정보 손실이 발생합니다. 이러한 제한을 해결하기 위해 우리는 WaLLoC (Wavelet Learned Lossy Compression)을 소개합니다. 이는 선형 변환 부호화를 비선형 차원 감소 오토인코더와 결합한 신경 코덱 아키텍처입니다. WaLLoC은 반전 가능한 웨이블릿 패킷 변환 사이에 얕은 비대칭 오토인코더와 엔트로피 병목을 삽입합니다. 여러 핵심 메트릭을 통해 WaLLoC은 최신 잠재 확산 모델에서 사용된 오토인코더보다 우수한 성능을 보입니다. WaLLoC은 고주파 세부 정보를 표현하기 위해 지각적이거나 적대적 손실이 필요하지 않으므로 RGB 이미지와 스테레오 오디오를 넘어 다양한 형태와 호환됩니다. WaLLoC의 인코더는 대부분 선형 연산으로 구성되어 있어 매우 효율적이며 모바일 컴퓨팅, 원격 감지 및 압축된 데이터로부터 직접 학습하는 데 적합합니다. 우리는 WaLLoC의 압축 도메인 학습 능력을 이미지 분류, 색칠, 문서 이해 및 음악 소스 분리를 포함한 여러 작업을 통해 시연합니다. 우리의 코드, 실험 및 사전 훈련된 오디오 및 이미지 코덱은 https://ut-sysml.org/walloc에서 제공됩니다.

English

Modern sensors produce increasingly rich streams of high-resolution data. Due to resource constraints, machine learning systems discard the vast majority of this information via resolution reduction. Compressed-domain learning allows models to operate on compact latent representations, allowing higher effective resolution for the same budget. However, existing compression systems are not ideal for compressed learning. Linear transform coding and end-to-end learned compression systems reduce bitrate, but do not uniformly reduce dimensionality; thus, they do not meaningfully increase efficiency. Generative autoencoders reduce dimensionality, but their adversarial or perceptual objectives lead to significant information loss. To address these limitations, we introduce WaLLoC (Wavelet Learned Lossy Compression), a neural codec architecture that combines linear transform coding with nonlinear dimensionality-reducing autoencoders. WaLLoC sandwiches a shallow, asymmetric autoencoder and entropy bottleneck between an invertible wavelet packet transform. Across several key metrics, WaLLoC outperforms the autoencoders used in state-of-the-art latent diffusion models. WaLLoC does not require perceptual or adversarial losses to represent high-frequency detail, providing compatibility with modalities beyond RGB images and stereo audio. WaLLoC's encoder consists almost entirely of linear operations, making it exceptionally efficient and suitable for mobile computing, remote sensing, and learning directly from compressed data. We demonstrate WaLLoC's capability for compressed-domain learning across several tasks, including image classification, colorization, document understanding, and music source separation. Our code, experiments, and pre-trained audio and image codecs are available at https://ut-sysml.org/walloc

압축 학습을 위한 학습된 압축

Learned Compression for Compressed Learning

초록

Support