ChatPaper.aiChatPaper

分割我在時間中: SMITE

SMITE: Segment Me In TimE

October 24, 2024
作者: Amirhossein Alimohammadi, Sauradip Nag, Saeid Asgari Taghanaki, Andrea Tagliasacchi, Ghassan Hamarneh, Ali Mahdavi Amiri
cs.AI

摘要

在視頻中對物體進行分割面臨著重大挑戰。每個像素必須被準確標記,並且這些標籤必須在幀之間保持一致。當分割具有任意粒度時,困難性增加,這意味著分段數量可以任意變化,並且遮罩是基於僅一個或少數樣本圖像定義的。在本文中,我們通過使用預先訓練的文本到圖像擴散模型並輔以額外的跟踪機制來解決這個問題。我們展示了我們的方法能夠有效地應對各種分割場景並且優於最先進的替代方案。
English
Segmenting an object in a video presents significant challenges. Each pixel must be accurately labelled, and these labels must remain consistent across frames. The difficulty increases when the segmentation is with arbitrary granularity, meaning the number of segments can vary arbitrarily, and masks are defined based on only one or a few sample images. In this paper, we address this issue by employing a pre-trained text to image diffusion model supplemented with an additional tracking mechanism. We demonstrate that our approach can effectively manage various segmentation scenarios and outperforms state-of-the-art alternatives.

Summary

AI-Generated Summary

PDF165November 16, 2024