大型語言模型（LLM）全棧安全綜合調查：數據、訓練與部署

摘要

大型語言模型（LLMs）的顯著成功，為學術界和工業界實現人工通用智慧開闢了一條充滿希望的道路，這得益於其在各種應用中前所未有的表現。隨著LLMs在研究和商業領域的持續崛起，其安全性和安全性影響已成為日益關注的焦點，不僅對研究人員和企業如此，對每個國家亦是如此。目前，現有的LLM安全性調查主要集中於LLM生命週期的特定階段，例如部署階段或微調階段，缺乏對LLM整個「生命鏈」的全面理解。為填補這一空白，本文首次引入了「全棧」安全性的概念，以系統性地考慮LLM訓練、部署及最終商業化整個過程中的安全性問題。與現成的LLM安全性調查相比，我們的工作展示了幾個顯著的優勢：（I）全面視角。我們將完整的LLM生命週期定義為涵蓋數據準備、預訓練、後訓練、部署及最終商業化。據我們所知，這是首次涵蓋LLM整個生命週期的安全性調查。（II）廣泛的文獻支持。我們的研究基於對800多篇論文的詳盡回顧，確保在更全面的理解下對安全性問題進行全面覆蓋和系統性組織。（III）獨特見解。通過系統的文獻分析，我們為每一章節開發了可靠的路線圖和視角。我們的工作識別了有前景的研究方向，包括數據生成中的安全性、對齊技術、模型編輯以及基於LLM的代理系統。這些見解為未來在此領域開展研究的研究人員提供了寶貴的指導。

English

The remarkable success of Large Language Models (LLMs) has illuminated a promising pathway toward achieving Artificial General Intelligence for both academic and industrial communities, owing to their unprecedented performance across various applications. As LLMs continue to gain prominence in both research and commercial domains, their security and safety implications have become a growing concern, not only for researchers and corporations but also for every nation. Currently, existing surveys on LLM safety primarily focus on specific stages of the LLM lifecycle, e.g., deployment phase or fine-tuning phase, lacking a comprehensive understanding of the entire "lifechain" of LLMs. To address this gap, this paper introduces, for the first time, the concept of "full-stack" safety to systematically consider safety issues throughout the entire process of LLM training, deployment, and eventual commercialization. Compared to the off-the-shelf LLM safety surveys, our work demonstrates several distinctive advantages: (I) Comprehensive Perspective. We define the complete LLM lifecycle as encompassing data preparation, pre-training, post-training, deployment and final commercialization. To our knowledge, this represents the first safety survey to encompass the entire lifecycle of LLMs. (II) Extensive Literature Support. Our research is grounded in an exhaustive review of over 800+ papers, ensuring comprehensive coverage and systematic organization of security issues within a more holistic understanding. (III) Unique Insights. Through systematic literature analysis, we have developed reliable roadmaps and perspectives for each chapter. Our work identifies promising research directions, including safety in data generation, alignment techniques, model editing, and LLM-based agent systems. These insights provide valuable guidance for researchers pursuing future work in this field.

大型語言模型（LLM）全棧安全綜合調查：數據、訓練與部署

A Comprehensive Survey in LLM(-Agent) Full Stack Safety: Data, Training and Deployment

摘要

Summary

Support

Support