파이프라인 병렬성과 어휘 병렬성의 균형 유지

초록

파이프라인 병렬화는 트랜스포머 기반 대형 언어 모델의 학습을 확장하는 데 널리 사용되며, 이에 대한 다양한 연구가 이루어져 왔습니다. 본 논문에서는 자주 간과되는 문제에 대해 다룹니다: 어휘 레이어는 파이프라인 단계 간의 계산 및 메모리 사용량 불균형을 유발하여 파이프라인 버블과 메모리 병목을 악화시킬 수 있습니다. 이를 해결하기 위해 우리는 어휘 레이어를 파이프라인 장치에 고르게 분할하고 계산을 파이프라인 패스로 그룹화합니다. 활성화 메모리 오버헤드를 줄이기 위해 어휘 레이어 내의 통신 장벽을 줄이기 위한 여러 알고리즘을 제안합니다. 게다가, 기존의 파이프라인 일정과 어휘 병렬화를 통합하기 위한 일반화 가능한 방법을 활용합니다. 이러한 기술을 결합함으로써, 우리의 방법은 계산과 매개변수 메모리를 효과적으로 균형 있게 유지하며, 작은 상수 활성화 메모리 오버헤드만 발생합니다. 특히, V-Half와 같은 활성화 메모리 균형 일정과 결합할 때, 우리의 접근 방식은 메모리와 계산 양쪽에서 완벽한 균형을 달성합니다. 포괄적인 평가 결과, 우리의 방법은 어휘 크기에 관계없이 계산과 메모리 균형을 달성하며, 순진한 방법에 비해 처리량이 5%에서 51% 향상되는 동시에, 특히 대형 어휘 시나리오에서 피크 메모리 사용량을 크게 줄입니다. 저희의 구현은 https://github.com/sail-sg/VocabularyParallelism 에서 오픈 소스로 제공됩니다.

English

Pipeline parallelism is widely used to scale the training of transformer-based large language models, various works have been done to improve its throughput and memory footprint. In this paper, we address a frequently overlooked issue: the vocabulary layers can cause imbalanced computation and memory usage across pipeline stages, worsening pipeline bubbles and the memory bottleneck. To tackle this, we partition the vocabulary layers evenly across pipeline devices and group the computation into pipeline passes. To reduce the activation memory overhead, we propose several algorithms to reduce communication barriers within vocabulary layers. Additionally, we utilize a generalizable method to integrate Vocabulary Parallelism with existing pipeline schedules. By combining these techniques, our methods effectively balance the computation and parameter memory, with only a small constant activation memory overhead. Notably, when combined with activation memory-balanced schedules like V-Half, our approach achieves perfect balance in both memory and computation. Extensive evaluations demonstrate that our method achieves computation and memory balance regardless of the vocabulary size, resulting in a 5% to 51% improvement in throughput compared to naive approaches, meanwhile significantly reducing peak memory usage especially for large vocabulary scenarios. Our implementation is open-sourced at https://github.com/sail-sg/VocabularyParallelism .

파이프라인 병렬성과 어휘 병렬성의 균형 유지

Balancing Pipeline Parallelism with Vocabulary Parallelism

초록

Support