エントロピー誘導型アテンションによるプライベートLLMs

要旨

プロプライエタリな言語モデルの普及は、ユーザーの機密情報を明らかにせずに暗号化されたデータ上で直接計算を行うプライベート推論（PI）の進歩が必要とされる重要なプライバシー上の懸念を引き起こしています。PIは有望な解決策を提供しますが、その実用的な展開は、非線形演算から主に生じる大幅な通信と遅延のオーバーヘッドによって妨げられています。この課題に対処するために、私たちは情報理論的な枠組みを導入して、デコーダーのみの言語モデルにおける非線形性の役割を特徴づけ、PIの要求に適したトランスフォーマーアーキテクチャを最適化するための原則的な基盤を提供します。 Shannonのエントロピーを定量的な尺度として活用することで、非線形性の以前に未探索だった二重の重要性を明らかにします。それは、訓練の安定性を確保するだけでなく、注意のヘッドの多様性を維持するために重要であることが示されます。具体的には、非線形性の除去が、訓練を不安定にするより深い層での「エントロピー崩壊」と、Multi-Head Attention（MHA）の表現能力の過小利用を引き起こす初期の層での「エントロピック過負荷」という2つの重要な障害モードを引き起こすことがわかります。エントロピーに誘導された注意メカニズムと新しいエントロピー正則化技術を組み合わせて、エントロピック過負荷を緩和する提案を行います。さらに、エントロピー崩壊を防止し、非線形性を削減したLLMの訓練を安定化するための、層正規化に代わるPI向けの代替手法を探求します。私たちの研究は、情報理論と建築設計の間のギャップを埋め、エントロピー動態を効率的なPIアーキテクチャを開発するための原則的なガイドとして確立します。コードと実装は、https://github.com/Nandan91/entropy-guided-attention-llm{entropy-guided-llm}で入手可能です。

English

The pervasiveness of proprietary language models has raised critical privacy concerns, necessitating advancements in private inference (PI), where computations are performed directly on encrypted data without revealing users' sensitive information. While PI offers a promising solution, its practical deployment is hindered by substantial communication and latency overheads, primarily stemming from nonlinear operations. To address this, we introduce an information-theoretic framework to characterize the role of nonlinearities in decoder-only language models, laying a principled foundation for optimizing transformer-architectures tailored to the demands of PI. By leveraging Shannon's entropy as a quantitative measure, we uncover the previously unexplored dual significance of nonlinearities: beyond ensuring training stability, they are crucial for maintaining attention head diversity. Specifically, we find that their removal triggers two critical failure modes: {\em entropy collapse} in deeper layers that destabilizes training, and {\em entropic overload} in earlier layers that leads to under-utilization of Multi-Head Attention's (MHA) representational capacity. We propose an entropy-guided attention mechanism paired with a novel entropy regularization technique to mitigate entropic overload. Additionally, we explore PI-friendly alternatives to layer normalization for preventing entropy collapse and stabilizing the training of LLMs with reduced-nonlinearities. Our study bridges the gap between information theory and architectural design, establishing entropy dynamics as a principled guide for developing efficient PI architectures. The code and implementation are available at https://github.com/Nandan91/entropy-guided-attention-llm{entropy-guided-llm}.

エントロピー誘導型アテンションによるプライベートLLMs

Entropy-Guided Attention for Private LLMs

要旨

Summary

Support