NeuZip: 신경망의 동적 압축을 통한 메모리 효율적인 학습 및 추론

초록

신경망의 성능은 더 많은 매개변수를 사용할 때 향상됩니다. 그러나 모델 크기는 훈련 및 추론 중에 사용 가능한 장치 내 메모리에 의해 제한됩니다. 양자화와 같은 기술을 적용하면 제한을 완화할 수 있지만, 성능 저하가 발생합니다. 본 연구에서는 신경망의 부동 소수점 숫자 엔트로피를 기반으로 한 새로운 가중치 압축 방식인 NeuZip을 소개합니다. NeuZip을 사용하면 성능을 희생하지 않고 메모리 효율적인 훈련과 추론을 달성할 수 있습니다. 특히, Llama-3 8B 모델의 훈련 메모리 풋프린트를 31GB에서 16GB 미만으로 크게 줄였으며, 훈련 역학을 완전히 유지했습니다. 추론에서는 메모리 사용량을 절반 이상으로 줄이면서 거의 손실이 없는 성능을 유지할 수 있습니다. 저희 코드는 공개적으로 이용 가능합니다.

English

The performance of neural networks improves when more parameters are used. However, the model sizes are constrained by the available on-device memory during training and inference. Although applying techniques like quantization can alleviate the constraint, they suffer from performance degradation. In this work, we introduce NeuZip, a new weight compression scheme based on the entropy of floating-point numbers in neural networks. With NeuZip, we are able to achieve memory-efficient training and inference without sacrificing performance. Notably, we significantly reduce the memory footprint of training a Llama-3 8B model from 31GB to less than 16GB, while keeping the training dynamics fully unchanged. In inference, our method can reduce memory usage by more than half while maintaining near-lossless performance. Our code is publicly available.

NeuZip: 신경망의 동적 압축을 통한 메모리 효율적인 학습 및 추론

NeuZip: Memory-Efficient Training and Inference with Dynamic Compression of Neural Networks

초록

Support