ChatPaper.aiChatPaper

EasyEdit2:一個易於使用的導向框架,用於編輯大型語言模型

EasyEdit2: An Easy-to-use Steering Framework for Editing Large Language Models

April 21, 2025
作者: Ziwen Xu, Shuxun Wang, Kewei Xu, Haoming Xu, Mengru Wang, Xinle Deng, Yunzhi Yao, Guozhou Zheng, Huajun Chen, Ningyu Zhang
cs.AI

摘要

本文介紹了EasyEdit2,這是一個旨在實現即插即用可調節性以控制大型語言模型(LLM)行為的框架。EasyEdit2支援多種測試時干預,包括安全性、情感、個性、推理模式、事實性和語言特徵。與其前身不同,EasyEdit2採用了一種專為無縫模型導向設計的新架構。它包含關鍵模組,如導向向量生成器和導向向量應用器,這些模組能夠自動生成並應用導向向量來影響模型的行為,而無需修改其參數。EasyEdit2的主要優勢之一是其易用性——用戶無需具備深入的技術知識。僅需一個示例,他們就能有效地引導和調整模型的反應,使得精確控制既易於實現又高效。我們實證報告了不同LLM上的模型導向性能,展示了這些技術的有效性。我們已在GitHub上發布了源代碼,網址為https://github.com/zjunlp/EasyEdit,並附帶了一個演示筆記本。此外,我們還提供了一個演示視頻,網址為https://zjunlp.github.io/project/EasyEdit2/video,以便快速了解。
English
In this paper, we introduce EasyEdit2, a framework designed to enable plug-and-play adjustability for controlling Large Language Model (LLM) behaviors. EasyEdit2 supports a wide range of test-time interventions, including safety, sentiment, personality, reasoning patterns, factuality, and language features. Unlike its predecessor, EasyEdit2 features a new architecture specifically designed for seamless model steering. It comprises key modules such as the steering vector generator and the steering vector applier, which enable automatic generation and application of steering vectors to influence the model's behavior without modifying its parameters. One of the main advantages of EasyEdit2 is its ease of use-users do not need extensive technical knowledge. With just a single example, they can effectively guide and adjust the model's responses, making precise control both accessible and efficient. Empirically, we report model steering performance across different LLMs, demonstrating the effectiveness of these techniques. We have released the source code on GitHub at https://github.com/zjunlp/EasyEdit along with a demonstration notebook. In addition, we provide a demo video at https://zjunlp.github.io/project/EasyEdit2/video for a quick introduction.

Summary

AI-Generated Summary

PDF212April 22, 2025