版權材料對大型語言模型的影響:挪威的觀點

The Impact of Copyrighted Material on Large Language Models: A Norwegian Perspective

December 12, 2024
作者: Javier de la Rosa, Vladislav Mikhailov, Lemei Zhang, Freddy Wetjen, David Samuel, Peng Liu, Rolv-Arild Braaten, Petter Mæhlum, Magnus Breder Birkenes, Andrey Kutuzov, Tita Enstad, Svein Arne Brygfjeld, Jon Atle Gulla, Stephan Oepen, Erik Velldal, Wilfred Østgulen, Liljia Øvrelid, Aslak Sira Myhre
cs.AI

摘要

在訓練生成式語言模型時使用受版權保護的材料引發了關鍵的法律和道德問題。本文提出了一個框架,並通過實證評估受版權材料對大型語言模型(LLMs)在挪威語上性能的影響來呈現結果。我們發現,當模型在多樣的挪威基準上進行評估時,無論是書籍還是報紙都對其有積極貢獻,而小說作品可能會導致性能下降。我們的實驗結果可以為那些作品對AI發展有貢獻的作者制定一個補償方案提供信息。
English
The use of copyrighted materials in training generative language models raises critical legal and ethical questions. This paper presents a framework for and the results of empirically assessing the impact of copyrighted materials on the performance of large language models (LLMs) for Norwegian. We found that both books and newspapers contribute positively when the models are evaluated on a diverse set of Norwegian benchmarks, while fiction works possibly lead to decreased performance. Our experiments could inform the creation of a compensation scheme for authors whose works contribute to AI development.

Summary

AI-Generated Summary

PDF72December 13, 2024