반 고흐를 반 고흐하기 위해 몇 개의 반 고흐가 필요한가? 모방 임계값 찾기

초록

텍스트 대 이미지 모델은 대량의 데이터셋을 사용하여 훈련됩니다. 이 데이터셋은 인터넷에서 이미지-텍스트 쌍을 수집하여 구성되는데, 종종 개인 정보, 저작권 소유물 및 라이선스가 필요한 자료를 포함합니다. 이러한 데이터셋에서 모델을 훈련시키면 해당 콘텐츠를 포함한 이미지를 생성할 수 있게 되는데, 이는 저작권법과 개인 정보 보호법을 위반할 수 있습니다. 이러한 현상은 모방이라고 불리며, 훈련 이미지와 유사성을 인식할 수 있는 콘텐츠를 가진 이미지를 생성하는 것을 의미합니다. 본 연구에서는 훈련 데이터셋 내에서 개념의 빈도와 모델이 모방할 수 있는 능력 사이의 관계를 연구합니다. 모델이 개념을 모방할 수 있는 충분한 인스턴스로 훈련된 지점을 결정하는 것을 목표로 하며, 이를 '모방 임계값 찾기'라는 새로운 문제로 제시합니다. 우리는 이러한 모방 임계값을 추정하는 효율적인 방법을 제안하며, 복수의 모델을 처음부터 훈련하는 엄청난 비용을 발생시키지 않고 모방 임계값을 추정합니다. 우리는 인간 얼굴과 예술 스타일 두 가지 도메인에서 네 개의 데이터셋을 생성하고, 두 개의 사전 훈련 데이터셋에서 훈련된 세 가지 텍스트 대 이미지 모델을 평가합니다. 결과는 이러한 모델의 모방 임계값이 도메인 및 모델에 따라 200-600개의 이미지 범위에 있음을 보여줍니다. 모방 임계값은 저작권 침해 주장에 대한 경험적 근거를 제공하며, 저작권 및 개인 정보 보호법을 준수하려는 텍스트 대 이미지 모델 개발자들에게 지침이 되는 원칙 역할을 합니다. 코드와 데이터는 https://github.com/vsahil/MIMETIC-2.git에서 공개되었으며, 프로젝트 웹사이트는 https://how-many-van-goghs-does-it-take.github.io에서 호스팅됩니다.

English

Text-to-image models are trained using large datasets collected by scraping image-text pairs from the internet. These datasets often include private, copyrighted, and licensed material. Training models on such datasets enables them to generate images with such content, which might violate copyright laws and individual privacy. This phenomenon is termed imitation -- generation of images with content that has recognizable similarity to its training images. In this work we study the relationship between a concept's frequency in the training dataset and the ability of a model to imitate it. We seek to determine the point at which a model was trained on enough instances to imitate a concept -- the imitation threshold. We posit this question as a new problem: Finding the Imitation Threshold (FIT) and propose an efficient approach that estimates the imitation threshold without incurring the colossal cost of training multiple models from scratch. We experiment with two domains -- human faces and art styles -- for which we create four datasets, and evaluate three text-to-image models which were trained on two pretraining datasets. Our results reveal that the imitation threshold of these models is in the range of 200-600 images, depending on the domain and the model. The imitation threshold can provide an empirical basis for copyright violation claims and acts as a guiding principle for text-to-image model developers that aim to comply with copyright and privacy laws. We release the code and data at https://github.com/vsahil/MIMETIC-2.git and the project's website is hosted at https://how-many-van-goghs-does-it-take.github.io.

반 고흐를 반 고흐하기 위해 몇 개의 반 고흐가 필요한가? 모방 임계값 찾기

How Many Van Goghs Does It Take to Van Gogh? Finding the Imitation Threshold

초록

Summary

Support