NeuZip：具有動態壓縮的神經網絡記憶高效訓練與推斷

摘要

當使用更多參數時，神經網絡的性能會提高。然而，在訓練和推斷期間，模型大小受可用的設備內存的限制。儘管應用量化等技術可以緩解這種限制，但它們會導致性能下降。在這項工作中，我們介紹了一種名為NeuZip的新權重壓縮方案，該方案基於神經網絡中浮點數的熵。使用NeuZip，我們能夠實現記憶體高效的訓練和推斷，而不會犧牲性能。值得注意的是，我們將訓練一個Llama-3 8B模型的內存占用從31GB顯著降低到不到16GB，同時保持訓練動態完全不變。在推斷中，我們的方法可以將內存使用量減少一半以上，同時保持接近無損性能。我們的代碼已公開發布。

English

The performance of neural networks improves when more parameters are used. However, the model sizes are constrained by the available on-device memory during training and inference. Although applying techniques like quantization can alleviate the constraint, they suffer from performance degradation. In this work, we introduce NeuZip, a new weight compression scheme based on the entropy of floating-point numbers in neural networks. With NeuZip, we are able to achieve memory-efficient training and inference without sacrificing performance. Notably, we significantly reduce the memory footprint of training a Llama-3 8B model from 31GB to less than 16GB, while keeping the training dynamics fully unchanged. In inference, our method can reduce memory usage by more than half while maintaining near-lossless performance. Our code is publicly available.

NeuZip：具有動態壓縮的神經網絡記憶高效訓練與推斷

NeuZip: Memory-Efficient Training and Inference with Dynamic Compression of Neural Networks

摘要

Summary

Support

Support