我們真的應該編輯語言模型嗎？評估編輯過的語言模型

摘要

模型編輯已成為在語言模型中高效更新知識的日益普遍的替代方法。目前的方法主要聚焦於可靠性、泛化性和局部性，許多方法在這些準則上表現出色。一些最近的研究揭示了這些編輯方法的缺陷，如知識扭曲或衝突。然而，經過編輯後的語言模型的一般能力尚未被探索。在本文中，我們對各種編輯方法和不同語言模型進行了全面評估，得出以下結論。 (1) 現有的編輯方法導致在一般基準上不可避免的性能下降，表明現有的編輯方法僅在少數編輯中維持模型的一般能力。當編輯數量稍大時，模型的內在知識結構被破壞甚至完全損壞。 (2) 經過指導調整的模型對編輯更具韌性，在編輯後對一般知識的性能下降較少。 (3) 規模較大的語言模型相對於小模型更具抗編輯性。 (4) 編輯後模型的安全性明顯受損，即使是那些與安全相關的模型也是如此。我們的研究結果表明，目前的編輯方法僅適用於語言模型中小規模知識的更新，這促使進一步研究更實用和可靠的編輯方法。程式碼和重現細節可在 https://github.com/lqinfdim/EditingEvaluation 找到。

English

Model editing has become an increasingly popular alternative for efficiently updating knowledge within language models. Current methods mainly focus on reliability, generalization, and locality, with many methods excelling across these criteria. Some recent works disclose the pitfalls of these editing methods such as knowledge distortion or conflict. However, the general abilities of post-edited language models remain unexplored. In this paper, we perform a comprehensive evaluation on various editing methods and different language models, and have following findings. (1) Existing editing methods lead to inevitable performance deterioration on general benchmarks, indicating that existing editing methods maintain the general abilities of the model within only a few dozen edits. When the number of edits is slightly large, the intrinsic knowledge structure of the model is disrupted or even completely damaged. (2) Instruction-tuned models are more robust to editing, showing less performance drop on general knowledge after editing. (3) Language model with large scale is more resistant to editing compared to small model. (4) The safety of the edited model, is significantly weakened, even for those safety-aligned models. Our findings indicate that current editing methods are only suitable for small-scale knowledge updates within language models, which motivates further research on more practical and reliable editing methods. The details of code and reproduction can be found in https://github.com/lqinfdim/EditingEvaluation.

我們真的應該編輯語言模型嗎？評估編輯過的語言模型

Should We Really Edit Language Models? On the Evaluation of Edited Language Models

摘要

Summary

Support

Support