ChatPaper.aiChatPaper

在词汇并行性和管道并行性之间取得平衡

Balancing Pipeline Parallelism with Vocabulary Parallelism

November 8, 2024
作者: Man Tsung Yeung, Penghui Qi, Min Lin, Xinyi Wan
cs.AI

摘要

管道并行性被广泛应用于扩展基于Transformer的大型语言模型的训练,已经进行了各种工作来提高其吞吐量和内存占用。本文解决了一个经常被忽视的问题:词汇层可能导致管道阶段之间的计算和内存使用不平衡,加剧了管道气泡和内存瓶颈。为了解决这个问题,我们将词汇层均匀地划分到管道设备上,并将计算分组为管道传递。为了减少激活内存开销,我们提出了几种算法来减少词汇层内的通信障碍。此外,我们利用一种通用方法将词汇并行性与现有的管道调度集成在一起。通过结合这些技术,我们的方法有效地平衡了计算和参数内存,仅有少量恒定的激活内存开销。值得注意的是,当与像V-Half这样的激活内存平衡调度结合时,我们的方法在内存和计算方面实现了完美的平衡。广泛的评估表明,我们的方法实现了计算和内存的平衡,无论词汇量大小如何,与朴素方法相比,吞吐量提高了5%至51%,同时显著减少了尤其是对于大词汇量场景的峰值内存使用。我们的实现已在https://github.com/sail-sg/VocabularyParallelism 开源。
English
Pipeline parallelism is widely used to scale the training of transformer-based large language models, various works have been done to improve its throughput and memory footprint. In this paper, we address a frequently overlooked issue: the vocabulary layers can cause imbalanced computation and memory usage across pipeline stages, worsening pipeline bubbles and the memory bottleneck. To tackle this, we partition the vocabulary layers evenly across pipeline devices and group the computation into pipeline passes. To reduce the activation memory overhead, we propose several algorithms to reduce communication barriers within vocabulary layers. Additionally, we utilize a generalizable method to integrate Vocabulary Parallelism with existing pipeline schedules. By combining these techniques, our methods effectively balance the computation and parameter memory, with only a small constant activation memory overhead. Notably, when combined with activation memory-balanced schedules like V-Half, our approach achieves perfect balance in both memory and computation. Extensive evaluations demonstrate that our method achieves computation and memory balance regardless of the vocabulary size, resulting in a 5% to 51% improvement in throughput compared to naive approaches, meanwhile significantly reducing peak memory usage especially for large vocabulary scenarios. Our implementation is open-sourced at https://github.com/sail-sg/VocabularyParallelism .

Summary

AI-Generated Summary

PDF203November 14, 2024