Eine umfassende Untersuchung zur Modellierung von Sprache mit langem Kontext

Zusammenfassung

Die effiziente Verarbeitung langer Kontexte ist ein anhaltendes Ziel in der Verarbeitung natürlicher Sprache. Mit der zunehmenden Anzahl langer Dokumente, Dialoge und anderer textueller Daten ist es wichtig, Long Context Language Models (LCLMs) zu entwickeln, die umfangreiche Eingaben effektiv und effizient verarbeiten und analysieren können. In diesem Artikel präsentieren wir einen umfassenden Überblick über die jüngsten Fortschritte in der Modellierung langer Kontexte für große Sprachmodelle. Unser Überblick ist um drei Schlüsselaspekte strukturiert: wie effektive und effiziente LCLMs erzielt werden können, wie LCLMs effizient trainiert und eingesetzt werden können und wie LCLMs umfassend evaluiert und analysiert werden können. Für den ersten Aspekt diskutieren wir Datenstrategien, Architekturentwürfe und Workflow-Ansätze, die auf die Verarbeitung langer Kontexte ausgerichtet sind. Für den zweiten Aspekt bieten wir eine detaillierte Untersuchung der Infrastruktur, die für das Training und die Inferenz von LCLMs erforderlich ist. Für den dritten Aspekt präsentieren wir Evaluationsparadigmen für das Verständnis langer Kontexte und die Generierung langer Texte sowie Verhaltensanalysen und Mechanismeninterpretierbarkeit von LCLMs. Über diese drei Schlüsselaspekte hinaus erforschen wir gründlich die vielfältigen Anwendungsszenarien, in denen bestehende LCLMs eingesetzt wurden, und skizzieren vielversprechende zukünftige Entwicklungsrichtungen. Dieser Überblick bietet eine aktuelle Zusammenfassung der Literatur zu Long-Context-LLMs, die wir als wertvolle Ressource sowohl für Forscher als auch für Ingenieure bereitstellen möchten. Ein zugehöriges GitHub-Repository, das die neuesten Artikel und Repos sammelt, ist verfügbar unter: https://github.com/LCLM-Horizon/A-Comprehensive-Survey-For-Long-Context-Language-Modeling{\color[RGB]{175,36,67}{LCLM-Horizon}}.

English

Efficient processing of long contexts has been a persistent pursuit in Natural Language Processing. With the growing number of long documents, dialogues, and other textual data, it is important to develop Long Context Language Models (LCLMs) that can process and analyze extensive inputs in an effective and efficient way. In this paper, we present a comprehensive survey on recent advances in long-context modeling for large language models. Our survey is structured around three key aspects: how to obtain effective and efficient LCLMs, how to train and deploy LCLMs efficiently, and how to evaluate and analyze LCLMs comprehensively. For the first aspect, we discuss data strategies, architectural designs, and workflow approaches oriented with long context processing. For the second aspect, we provide a detailed examination of the infrastructure required for LCLM training and inference. For the third aspect, we present evaluation paradigms for long-context comprehension and long-form generation, as well as behavioral analysis and mechanism interpretability of LCLMs. Beyond these three key aspects, we thoroughly explore the diverse application scenarios where existing LCLMs have been deployed and outline promising future development directions. This survey provides an up-to-date review of the literature on long-context LLMs, which we wish to serve as a valuable resource for both researchers and engineers. An associated GitHub repository collecting the latest papers and repos is available at: https://github.com/LCLM-Horizon/A-Comprehensive-Survey-For-Long-Context-Language-Modeling{\color[RGB]{175,36,67}{LCLM-Horizon}}.

Eine umfassende Untersuchung zur Modellierung von Sprache mit langem Kontext

A Comprehensive Survey on Long Context Language Modeling

Zusammenfassung

Summary

Support