ChatPaper.aiChatPaper

长上下文大语言模型如是说

Thus Spake Long-Context Large Language Model

February 24, 2025
作者: Xiaoran Liu, Ruixiao Li, Mianqiu Huang, Zhigeng Liu, Yuerong Song, Qipeng Guo, Siyang He, Qiqi Wang, Linlin Li, Qun Liu, Yaqian Zhou, Xuanjing Huang, Xipeng Qiu
cs.AI

摘要

长上下文是自然语言处理(NLP)中的一个重要课题,贯穿了NLP架构的发展历程,并为大语言模型(LLMs)提供了巨大的机遇,赋予其类似人类的终身学习潜力。然而,追求长上下文的过程中伴随着诸多挑战。尽管如此,长上下文仍然是LLMs的核心竞争优势。过去两年间,LLMs的上下文长度已实现突破性扩展,达到数百万个标记。此外,长上下文LLMs的研究已从长度外推拓展至对架构、基础设施、训练及评估技术的全面关注。 受交响诗《查拉图斯特拉如是说》的启发,我们将LLM扩展上下文的旅程与人类试图超越其有限性的尝试相类比。在本综述中,我们将阐述LLM如何在延长上下文的巨大需求与接受其终究有限的事实之间挣扎。为此,我们从架构、基础设施、训练和评估四个视角,全面描绘了长上下文LLMs的生命周期,展示了长上下文技术的全貌。在综述的最后,我们将提出当前长上下文LLMs面临的十大未解问题。我们希望本综述能作为长上下文LLMs研究的系统性导引。
English
Long context is an important topic in Natural Language Processing (NLP), running through the development of NLP architectures, and offers immense opportunities for Large Language Models (LLMs) giving LLMs the lifelong learning potential akin to humans. Unfortunately, the pursuit of a long context is accompanied by numerous obstacles. Nevertheless, long context remains a core competitive advantage for LLMs. In the past two years, the context length of LLMs has achieved a breakthrough extension to millions of tokens. Moreover, the research on long-context LLMs has expanded from length extrapolation to a comprehensive focus on architecture, infrastructure, training, and evaluation technologies. Inspired by the symphonic poem, Thus Spake Zarathustra, we draw an analogy between the journey of extending the context of LLM and the attempts of humans to transcend its mortality. In this survey, We will illustrate how LLM struggles between the tremendous need for a longer context and its equal need to accept the fact that it is ultimately finite. To achieve this, we give a global picture of the lifecycle of long-context LLMs from four perspectives: architecture, infrastructure, training, and evaluation, showcasing the full spectrum of long-context technologies. At the end of this survey, we will present 10 unanswered questions currently faced by long-context LLMs. We hope this survey can serve as a systematic introduction to the research on long-context LLMs.

Summary

AI-Generated Summary

PDF686February 25, 2025