SketchAgent:基於語言驅動的序列素描生成

SketchAgent: Language-Driven Sequential Sketch Generation

November 26, 2024
作者: Yael Vinker, Tamar Rott Shaham, Kristine Zheng, Alex Zhao, Judith E Fan, Antonio Torralba
cs.AI

摘要

素描作為一種多功能工具,可將想法外顯化,促進快速探索和視覺溝通,跨越各種學科領域。儘管人工系統在內容創作和人機交互方面取得了重大進展,但捕捉人類素描的動態和抽象特性仍然具有挑戰性。在這項工作中,我們介紹了SketchAgent,一種以語言驅動的、順序素描生成方法,使用戶能夠通過動態的對話互動來創建、修改和完善素描。我們的方法無需訓練或微調。相反,我們利用現成的多模式大型語言模型(LLMs)的順序性和豐富的先前知識。我們提出了一種直觀的素描語言,通過上下文示例引入模型,使其能夠使用基於字符串的操作來"繪製"。這些操作被處理成向量圖形,然後呈現在像素畫布上,可以再次訪問以進行進一步的任務。通過逐筆描繪,我們的代理捕捉了與素描固有的不斷變化和動態特質。我們展示了SketchAgent能夠從不同提示生成素描,進行對話驅動的繪圖,並與人類用戶有意義地合作。
English
Sketching serves as a versatile tool for externalizing ideas, enabling rapid exploration and visual communication that spans various disciplines. While artificial systems have driven substantial advances in content creation and human-computer interaction, capturing the dynamic and abstract nature of human sketching remains challenging. In this work, we introduce SketchAgent, a language-driven, sequential sketch generation method that enables users to create, modify, and refine sketches through dynamic, conversational interactions. Our approach requires no training or fine-tuning. Instead, we leverage the sequential nature and rich prior knowledge of off-the-shelf multimodal large language models (LLMs). We present an intuitive sketching language, introduced to the model through in-context examples, enabling it to "draw" using string-based actions. These are processed into vector graphics and then rendered to create a sketch on a pixel canvas, which can be accessed again for further tasks. By drawing stroke by stroke, our agent captures the evolving, dynamic qualities intrinsic to sketching. We demonstrate that SketchAgent can generate sketches from diverse prompts, engage in dialogue-driven drawing, and collaborate meaningfully with human users.

Summary

AI-Generated Summary

PDF184November 27, 2024