ChatPaper.aiChatPaper

NeuZip:具有動態壓縮的神經網絡記憶高效訓練與推斷

NeuZip: Memory-Efficient Training and Inference with Dynamic Compression of Neural Networks

October 28, 2024
作者: Yongchang Hao, Yanshuai Cao, Lili Mou
cs.AI

摘要

當使用更多參數時,神經網絡的性能會提高。 然而,在訓練和推斷期間,模型大小受可用的設備內存的限制。 儘管應用量化等技術可以緩解這種限制,但它們會導致性能下降。在這項工作中,我們介紹了一種名為NeuZip的新權重壓縮方案,該方案基於神經網絡中浮點數的熵。使用NeuZip,我們能夠實現記憶體高效的訓練和推斷,而不會犧牲性能。值得注意的是,我們將訓練一個Llama-3 8B模型的內存占用從31GB顯著降低到不到16GB,同時保持訓練動態完全不變。在推斷中,我們的方法可以將內存使用量減少一半以上,同時保持接近無損性能。我們的代碼已公開發布。
English
The performance of neural networks improves when more parameters are used. However, the model sizes are constrained by the available on-device memory during training and inference. Although applying techniques like quantization can alleviate the constraint, they suffer from performance degradation. In this work, we introduce NeuZip, a new weight compression scheme based on the entropy of floating-point numbers in neural networks. With NeuZip, we are able to achieve memory-efficient training and inference without sacrificing performance. Notably, we significantly reduce the memory footprint of training a Llama-3 8B model from 31GB to less than 16GB, while keeping the training dynamics fully unchanged. In inference, our method can reduce memory usage by more than half while maintaining near-lossless performance. Our code is publicly available.

Summary

AI-Generated Summary

PDF172November 13, 2024