LLM 기반 패러프레이저를 사용한 견고한 다비트 텍스트 워터마크

초록

우리는 LLMs를 사용하여 패러프레이징을 통해 임베드된 감지하기 어려운 멀티비트 텍스트 워터마크를 제안합니다. 우리는 서로 다르게 작동하도록 설계된 두 개의 LLM 패러프레이저를 세밀하게 조정하여, 텍스트 의미에 반영된 패러프레이징 차이를 훈련된 디코더가 식별할 수 있도록 합니다. 우리의 멀티비트 워터마크를 임베드하기 위해, 우리는 미리 정의된 이진 코드를 문장 수준에서 인코딩하기 위해 두 개의 패러프레이저를 번갈아 사용합니다. 그런 다음 텍스트 분류기를 디코더로 사용하여 워터마크의 각 비트를 디코딩합니다. 다양한 실험을 통해, 우리의 워터마크가 작은(1.1B) 텍스트 패러프레이저를 사용하면서 원래 문장의 의미 정보를 유지하면서 99.99% 이상의 감지 AUC를 달성할 수 있음을 보여줍니다. 더 중요한 것은, 우리의 파이프라인이 단어 대체 및 문장 패러프레이징 변조에 대해 견고하며, out-of-distributional 데이터에 대해 잘 일반화됨을 보여줍니다. 또한 LLM 기반 평가를 통해 우리의 워터마크의 은밀성을 보여줍니다. 코드는 오픈 소스로 제공됩니다: https://github.com/xiaojunxu/multi-bit-text-watermark.

English

We propose an imperceptible multi-bit text watermark embedded by paraphrasing with LLMs. We fine-tune a pair of LLM paraphrasers that are designed to behave differently so that their paraphrasing difference reflected in the text semantics can be identified by a trained decoder. To embed our multi-bit watermark, we use two paraphrasers alternatively to encode the pre-defined binary code at the sentence level. Then we use a text classifier as the decoder to decode each bit of the watermark. Through extensive experiments, we show that our watermarks can achieve over 99.99\% detection AUC with small (1.1B) text paraphrasers while keeping the semantic information of the original sentence. More importantly, our pipeline is robust under word substitution and sentence paraphrasing perturbations and generalizes well to out-of-distributional data. We also show the stealthiness of our watermark with LLM-based evaluation. We open-source the code: https://github.com/xiaojunxu/multi-bit-text-watermark.

LLM 기반 패러프레이저를 사용한 견고한 다비트 텍스트 워터마크

Robust Multi-bit Text Watermark with LLM-based Paraphrasers

초록

Support