개인 LLM을 위한 엔트로피 가이드 어텐션

초록

전용 언어 모델의 보급은 중요한 개인 정보 보호 문제를 제기하여, 사용자의 민감한 정보를 노출하지 않고 암호화된 데이터 상에서 직접 계산을 수행하는 개인 추론(PI)의 발전이 필요하게 되었다. PI는 유망한 해결책을 제공하지만, 비선형 연산에서 비롯된 상당한 통신 및 지연 오버헤드로 인해 실제 적용이 어렵다. 이를 해결하기 위해, 우리는 비선형성의 역할을 해석하기 위한 정보 이론적 프레임워크를 소개하여, PI의 요구에 맞게 최적화된 트랜스포머 구조를 개발하는 원칙적인 기반을 마련한다. Shannon의 엔트로피를 양적 측정 항목으로 활용하여, 비선형성의 이전에 미처 발견되지 못한 이중적 의의를 밝혀내었다: 훈련 안정성을 보장하는 데 그치지 않고, 주의 헤드 다양성을 유지하는 데 중요하다는 것을 발견했다. 특히, 비선형성의 제거가 두 가지 핵심 실패 모드를 유발한다는 것을 발견했다: 깊은 층에서의 '엔트로피 붕괴'는 훈련을 불안정하게 만들고, 초기 층에서의 '엔트로피 과부하'는 Multi-Head Attention(MHA)의 표현 능력을 미활용하게 만든다. 엔트로피에 따른 주의 메커니즘과 새로운 엔트로피 정규화 기술을 활용하여 엔트로피 과부하를 완화하는 방안을 제안한다. 또한, 비선형성을 줄인 LLM의 훈련을 안정화하고 엔트로피 붕괴를 방지하기 위한 PI 친화적인 레이어 정규화 대안을 탐구한다. 우리의 연구는 정보 이론과 구조적 설계 사이의 간극을 메우며, 효율적인 PI 구조를 개발하기 위한 원칙적인 가이드로서의 엔트로피 역학을 확립한다. 코드 및 구현은 https://github.com/Nandan91/entropy-guided-attention-llm{entropy-guided-llm}에서 확인할 수 있다.

English

The pervasiveness of proprietary language models has raised critical privacy concerns, necessitating advancements in private inference (PI), where computations are performed directly on encrypted data without revealing users' sensitive information. While PI offers a promising solution, its practical deployment is hindered by substantial communication and latency overheads, primarily stemming from nonlinear operations. To address this, we introduce an information-theoretic framework to characterize the role of nonlinearities in decoder-only language models, laying a principled foundation for optimizing transformer-architectures tailored to the demands of PI. By leveraging Shannon's entropy as a quantitative measure, we uncover the previously unexplored dual significance of nonlinearities: beyond ensuring training stability, they are crucial for maintaining attention head diversity. Specifically, we find that their removal triggers two critical failure modes: {\em entropy collapse} in deeper layers that destabilizes training, and {\em entropic overload} in earlier layers that leads to under-utilization of Multi-Head Attention's (MHA) representational capacity. We propose an entropy-guided attention mechanism paired with a novel entropy regularization technique to mitigate entropic overload. Additionally, we explore PI-friendly alternatives to layer normalization for preventing entropy collapse and stabilizing the training of LLMs with reduced-nonlinearities. Our study bridges the gap between information theory and architectural design, establishing entropy dynamics as a principled guide for developing efficient PI architectures. The code and implementation are available at https://github.com/Nandan91/entropy-guided-attention-llm{entropy-guided-llm}.

개인 LLM을 위한 엔트로피 가이드 어텐션

Entropy-Guided Attention for Private LLMs

초록

Summary

Support