SAMPart3D : Segmenter n'importe quelle partie dans des objets 3D

SAMPart3D: Segment Any Part in 3D Objects

November 11, 2024
Auteurs: Yunhan Yang, Yukun Huang, Yuan-Chen Guo, Liangjun Lu, Xiaoyang Wu, Edmund Y. Lam, Yan-Pei Cao, Xihui Liu
cs.AI

Résumé

La segmentation de pièces en 3D est une tâche cruciale et complexe en perception 3D, jouant un rôle vital dans des applications telles que la robotique, la génération 3D et l'édition 3D. Les méthodes récentes exploitent les puissants Modèles de Langage Vision (VLM) pour la distillation des connaissances de 2D à 3D, permettant d'atteindre une segmentation de pièces en 3D sans étiquetage. Cependant, ces méthodes sont limitées par leur dépendance aux instructions textuelles, ce qui restreint la scalabilité aux ensembles de données non étiquetés à grande échelle et la flexibilité dans la gestion des ambiguïtés de pièces. Dans ce travail, nous introduisons SAMPart3D, un cadre de segmentation de pièces en 3D sans étiquetage et scalable qui segmente tout objet 3D en parties sémantiques à plusieurs granularités, sans nécessiter d'ensembles d'étiquettes de pièces prédéfinis comme instructions textuelles. Pour la scalabilité, nous utilisons des modèles de vision fondamentaux agnostiques au texte pour distiller une colonne vertébrale d'extraction de caractéristiques 3D, permettant de passer à des ensembles de données 3D non étiquetés à grande échelle pour apprendre des connaissances 3D riches. Pour la flexibilité, nous distillons des caractéristiques 3D conscientes de la taille et des parties pour la segmentation de pièces en 3D à plusieurs granularités. Une fois que les parties segmentées sont obtenues à partir des caractéristiques 3D conscientes de la taille et des parties, nous utilisons des VLM pour attribuer des étiquettes sémantiques à chaque partie en fonction des rendus multi-vues. Comparé aux méthodes précédentes, notre SAMPart3D peut s'adapter à l'ensemble de données d'objets 3D à grande échelle récente Objaverse et gérer des objets complexes et non ordinaires. De plus, nous contribuons à un nouveau banc d'essai de segmentation de pièces en 3D pour pallier le manque de diversité et de complexité des objets et des pièces dans les bancs d'essai existants. Les expériences montrent que notre SAMPart3D surpasse significativement les méthodes existantes de segmentation de pièces en 3D sans étiquetage, et peut faciliter diverses applications telles que l'édition au niveau des pièces et la segmentation interactive.
English
3D part segmentation is a crucial and challenging task in 3D perception, playing a vital role in applications such as robotics, 3D generation, and 3D editing. Recent methods harness the powerful Vision Language Models (VLMs) for 2D-to-3D knowledge distillation, achieving zero-shot 3D part segmentation. However, these methods are limited by their reliance on text prompts, which restricts the scalability to large-scale unlabeled datasets and the flexibility in handling part ambiguities. In this work, we introduce SAMPart3D, a scalable zero-shot 3D part segmentation framework that segments any 3D object into semantic parts at multiple granularities, without requiring predefined part label sets as text prompts. For scalability, we use text-agnostic vision foundation models to distill a 3D feature extraction backbone, allowing scaling to large unlabeled 3D datasets to learn rich 3D priors. For flexibility, we distill scale-conditioned part-aware 3D features for 3D part segmentation at multiple granularities. Once the segmented parts are obtained from the scale-conditioned part-aware 3D features, we use VLMs to assign semantic labels to each part based on the multi-view renderings. Compared to previous methods, our SAMPart3D can scale to the recent large-scale 3D object dataset Objaverse and handle complex, non-ordinary objects. Additionally, we contribute a new 3D part segmentation benchmark to address the lack of diversity and complexity of objects and parts in existing benchmarks. Experiments show that our SAMPart3D significantly outperforms existing zero-shot 3D part segmentation methods, and can facilitate various applications such as part-level editing and interactive segmentation.

Summary

AI-Generated Summary

Paper Overview

This paper introduces a multi-granular segmentation visualization of point clouds and meshes, enhancing the understanding of segmented parts through semantic annotations. It also presents SAMPart3D, a framework for 3D part segmentation without explicit annotations, achieving superior performance in zero-shot scenarios.

Core Contribution

  • Introducing a multi-granular segmentation visualization for point clouds and meshes.
  • Development of SAMPart3D, a framework for 3D part segmentation without annotations, outperforming existing methods in zero-shot settings.

Research Context

The research addresses the need for improved 3D part segmentation techniques without relying on predefined labels or textual prompts, leveraging vision and language models for knowledge distillation.

Keywords

3D Part Segmentation, Point Clouds, Meshes, Semantic Annotations, SAMPart3D, Zero-shot Segmentation, Vision-Language Models

Background

The research aims to fill the gap in 3D part segmentation by proposing SAMPart3D, a framework that overcomes the limitations of existing annotated datasets and generalizes well to open 3D objects.

Research Gap

Existing methods lack the ability to segment 3D parts without prior annotations, hindering their applicability to diverse and complex objects.

Technical Challenges

Challenges include generalizing to open 3D objects, utilizing unannotated 3D knowledge, and managing semantic ambiguity in 3D part segmentation.

Prior Approaches

Previous methods heavily relied on annotated datasets or textual cues for 3D part segmentation, limiting their adaptability to new objects and scenarios.

Methodology

The methodology involves leveraging large-scale pre-training, fine-tuning on specific samples, and semantic querying without training to achieve effective 3D part segmentation.

Theoretical Foundation

Utilizing DINOv2 for distilling 2D visual features into a 3D base network, and employing vision-language models for semantic labeling.

Technical Architecture

Incorporating FeatUp for enhancing DINOv2 features, performing scale-conditioned clustering for 3D point clouds, and introducing long-range connections for capturing low-level features.

Implementation Details

Utilizing MSE loss for distilling 2D features, specific sample fine-tuning for 2D mask distillation, and HDBSCAN clustering for feature grouping.

Innovation Points

  • Introducing SAMPart3D for 3D part segmentation without annotations.
  • Utilizing vision-language models for semantic querying in 3D object parts.

Experimental Validation

The experimental validation involves thorough evaluations on the PartObjaverse-Tiny dataset, comparisons with existing methods, and ablation studies to analyze the model components' impact.

Setup

  • Pre-training on 200,000 high-quality objects for 7 days using PTv3-object model.
  • Fine-tuning with 15,000 mesh surface points and 36 object views for 2D segmentation mask generation.
  • Utilization of HDBSCAN for feature clustering and GPT-4o for semantic queries.

Metrics

Evaluation based on segmentation accuracy, generalization to unseen objects, and efficiency in zero-shot scenarios.

Results

Superior performance of SAMPart3D compared to existing methods, especially in zero-shot settings, demonstrated through quantitative and qualitative results.

Comparative Analysis

Comparisons with other 3D part segmentation methods, highlighting the advantages of SAMPart3D in diverse object segmentation.

Impact and Implications

SAMPart3D offers a groundbreaking approach to 3D part segmentation, showcasing practical applications in material editing, shape manipulation, and hierarchical segmentation based on user interactions.

Key Findings

  • SAMPart3D outperforms existing methods in zero-shot 3D part segmentation.
  • Introduction of PartObjaverse-Tiny dataset to enhance dataset diversity and complexity.

Limitations

Challenges include the impact of inaccurate masks on final results and the training speed for feature grouping.

Future Directions

Future research opportunities include refining segmentation accuracy, enhancing model efficiency, and exploring real-world applications in 3D object manipulation.

Practical Significance

SAMPart3D's advancements have practical implications in various industries, enabling more efficient and accurate 3D part segmentation for applications like 3D modeling and virtual reality.

Articles en Vedette

DeepSeek-R1 : Encourager la capacité de raisonnement dans les LLMs via l'apprentissage par renforcement
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

DeepSeek-AI, Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Ruoyu Zhang, Runxin Xu, Qihao Zhu, Shirong Ma, Peiyi Wang, Xiao Bi, Xiaokang Zhang, Xingkai Yu, Yu Wu, Z. F. Wu, Zhibin Gou, Zhihong Shao, Zhuoshu Li, Ziyi Gao, Aixin Liu, Bing Xue, Bingxuan Wang, Bochao Wu, Bei Feng, Chengda Lu, Chenggang Zhao, Chengqi Deng, Chenyu Zhang, Chong Ruan, Damai Dai, Deli Chen, Dongjie Ji, Erhang Li, Fangyun Lin, Fucong Dai, Fuli Luo, Guangbo Hao, Guanting Chen, Guowei Li, H. Zhang, Han Bao, Hanwei Xu, Haocheng Wang, Honghui Ding, Huajian Xin, Huazuo Gao, Hui Qu, Hui Li, Jianzhong Guo, Jiashi Li, Jiawei Wang, Jingchang Chen, Jingyang Yuan, Junjie Qiu, Junlong Li, J. L. Cai, Jiaqi Ni, Jian Liang, Jin Chen, Kai Dong, Kai Hu, Kaige Gao, Kang Guan, Kexin Huang, Kuai Yu, Lean Wang, Lecong Zhang, Liang Zhao, Litong Wang, Liyue Zhang, Lei Xu, Leyi Xia, Mingchuan Zhang, Minghua Zhang, Minghui Tang, Meng Li, Miaojun Wang, Mingming Li, Ning Tian, Panpan Huang, Peng Zhang, Qiancheng Wang, Qinyu Chen, Qiushi Du, Ruiqi Ge, Ruisong Zhang, Ruizhe Pan, Runji Wang, R. J. Chen, R. L. Jin, Ruyi Chen, Shanghao Lu, Shangyan Zhou, Shanhuang Chen, Shengfeng Ye, Shiyu Wang, Shuiping Yu, Shunfeng Zhou, Shuting Pan, S. S. Li, Shuang Zhou, Shaoqing Wu, Shengfeng Ye, Tao Yun, Tian Pei, Tianyu Sun, T. Wang, Wangding Zeng, Wanjia Zhao, Wen Liu, Wenfeng Liang, Wenjun Gao, Wenqin Yu, Wentao Zhang, W. L. Xiao, Wei An, Xiaodong Liu, Xiaohan Wang, Xiaokang Chen, Xiaotao Nie, Xin Cheng, Xin Liu, Xin Xie, Xingchao Liu, Xinyu Yang, Xinyuan Li, Xuecheng Su, Xuheng Lin, X. Q. Li, Xiangyue Jin, Xiaojin Shen, Xiaosha Chen, Xiaowen Sun, Xiaoxiang Wang, Xinnan Song, Xinyi Zhou, Xianzu Wang, Xinxia Shan, Y. K. Li, Y. Q. Wang, Y. X. Wei, Yang Zhang, Yanhong Xu, Yao Li, Yao Zhao, Yaofeng Sun, Yaohui Wang, Yi Yu, Yichao Zhang, Yifan Shi, Yiliang Xiong, Ying He, Yishi Piao, Yisong Wang, Yixuan Tan, Yiyang Ma, Yiyuan Liu, Yongqiang Guo, Yuan Ou, Yuduan Wang, Yue Gong, Yuheng Zou, Yujia He, Yunfan Xiong, Yuxiang Luo, Yuxiang You, Yuxuan Liu, Yuyang Zhou, Y. X. Zhu, Yanhong Xu, Yanping Huang, Yaohui Li, Yi Zheng, Yuchen Zhu, Yunxian Ma, Ying Tang, Yukun Zha, Yuting Yan, Z. Z. Ren, Zehui Ren, Zhangli Sha, Zhe Fu, Zhean Xu, Zhenda Xie, Zhengyan Zhang, Zhewen Hao, Zhicheng Ma, Zhigang Yan, Zhiyu Wu, Zihui Gu, Zijia Zhu, Zijun Liu, Zilin Li, Ziwei Xie, Ziyang Song, Zizheng Pan, Zhen Huang, Zhipeng Xu, Zhongyu Zhang, Zhen ZhangJan 22, 20253685

Rapport technique de Qwen2.5
Qwen2.5 Technical Report

Qwen, An Yang, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chengyuan Li, Dayiheng Liu, Fei Huang, Haoran Wei, Huan Lin, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Yang, Jiaxi Yang, Jingren Zhou, Junyang Lin, Kai Dang, Keming Lu, Keqin Bao, Kexin Yang, Le Yu, Mei Li, Mingfeng Xue, Pei Zhang, Qin Zhu, Rui Men, Runji Lin, Tianhao Li, Tingyu Xia, Xingzhang Ren, Xuancheng Ren, Yang Fan, Yang Su, Yichang Zhang, Yu Wan, Yuqiong Liu, Zeyu Cui, Zhenru Zhang, Zihan QiuDec 19, 202436311

MiniMax-01 : Mise à l'échelle des modèles de base avec Attention Éclair.
MiniMax-01: Scaling Foundation Models with Lightning Attention

MiniMax, Aonian Li, Bangwei Gong, Bo Yang, Boji Shan, Chang Liu, Cheng Zhu, Chunhao Zhang, Congchao Guo, Da Chen, Dong Li, Enwei Jiao, Gengxin Li, Guojun Zhang, Haohai Sun, Houze Dong, Jiadai Zhu, Jiaqi Zhuang, Jiayuan Song, Jin Zhu, Jingtao Han, Jingyang Li, Junbin Xie, Junhao Xu, Junjie Yan, Kaishun Zhang, Kecheng Xiao, Kexi Kang, Le Han, Leyang Wang, Lianfei Yu, Liheng Feng, Lin Zheng, Linbo Chai, Long Xing, Meizhi Ju, Mingyuan Chi, Mozhi Zhang, Peikai Huang, Pengcheng Niu, Pengfei Li, Pengyu Zhao, Qi Yang, Qidi Xu, Qiexiang Wang, Qin Wang, Qiuhui Li, Ruitao Leng, Shengmin Shi, Shuqi Yu, Sichen Li, Songquan Zhu, Tao Huang, Tianrun Liang, Weigao Sun, Weixuan Sun, Weiyu Cheng, Wenkai Li, Xiangjun Song, Xiao Su, Xiaodong Han, Xinjie Zhang, Xinzhu Hou, Xu Min, Xun Zou, Xuyang Shen, Yan Gong, Yingjie Zhu, Yipeng Zhou, Yiran Zhong, Yongyi Hu, Yuanxiang Fan, Yue Yu, Yufeng Yang, Yuhao Li, Yunan Huang, Yunji Li, Yunpeng Huang, Yunzhi Xu, Yuxin Mao, Zehan Li, Zekang Li, Zewei Tao, Zewen Ying, Zhaoyang Cong, Zhen Qin, Zhenhua Fan, Zhihang Yu, Zhuo Jiang, Zijia WuJan 14, 20252826

PDF282November 13, 2024