基於LLM的強韌多比特文本水印

摘要

我們提出了一種利用LLM進行改寫嵌入的不可察覺的多比特文本水印。我們微調了一對LLM改寫器，這些改寫器被設計成行為不同，以便訓練過的解碼器可以識別在文本語義中反映出的改寫差異。為了嵌入我們的多比特水印，我們交替使用兩個改寫器在句子級別對預定義的二進制代碼進行編碼。然後，我們使用文本分類器作為解碼器來解碼水印的每一位。通過大量實驗，我們展示了我們的水印可以在保留原始句子的語義信息的同時，使用小型（1.1B）文本改寫器實現超過99.99％的檢測AUC。更重要的是，我們的流程在詞語替換和句子改寫干擾下具有韌性，並且對分布之外的數據具有良好的泛化能力。我們還展示了基於LLM的評估方法的水印隱匿性。我們將代碼開源：https://github.com/xiaojunxu/multi-bit-text-watermark。

English

We propose an imperceptible multi-bit text watermark embedded by paraphrasing with LLMs. We fine-tune a pair of LLM paraphrasers that are designed to behave differently so that their paraphrasing difference reflected in the text semantics can be identified by a trained decoder. To embed our multi-bit watermark, we use two paraphrasers alternatively to encode the pre-defined binary code at the sentence level. Then we use a text classifier as the decoder to decode each bit of the watermark. Through extensive experiments, we show that our watermarks can achieve over 99.99\% detection AUC with small (1.1B) text paraphrasers while keeping the semantic information of the original sentence. More importantly, our pipeline is robust under word substitution and sentence paraphrasing perturbations and generalizes well to out-of-distributional data. We also show the stealthiness of our watermark with LLM-based evaluation. We open-source the code: https://github.com/xiaojunxu/multi-bit-text-watermark.

基於LLM的強韌多比特文本水印

Robust Multi-bit Text Watermark with LLM-based Paraphrasers

摘要

Summary

Support