IDArb: 임의의 입력 뷰와 조명에 대한 내재적 분해

초록

이미지로부터 기하학적 및 물질 정보를 캡처하는 것은 컴퓨터 비전 및 그래픽스 분야에서 여전히 기본적인 과제입니다. 기존의 최적화 기반 방법은 종종 밀도가 높은 다중 뷰 입력으로부터 기하학, 물질 특성 및 환경 조명을 재구성하는 데 수십 시간의 계산 시간이 필요하며 여전히 조명과 물질 간의 내재적 모호함에 직면합니다. 반면, 학습 기반 접근 방식은 기존 3D 객체 데이터셋에서 풍부한 물질 사전을 활용하지만 다중 뷰 일관성 유지에 어려움을 겪습니다. 본 논문에서는 임의의 조명 조건 하에서 다중 이미지에 대한 본질적 분해를 수행하기 위해 설계된 확산 기반 모델인 IDArb를 소개합니다. 우리의 방법은 획기적인 교차-뷰, 교차-도메인 주의 모듈과 조명 증강, 뷰 적응형 훈련 전략을 통해 표면 법선 및 물질 특성에 대한 정확하고 다중 뷰 일관된 추정을 달성합니다. 더불어, 우리는 다양한 조명 조건에서 대규모 다중 뷰 본질적 데이터 및 렌더링을 제공하는 새로운 데이터셋인 ARB-Objaverse를 소개하며 견고한 훈련을 지원합니다. 광범위한 실험 결과는 IDArb가 질적으로나 양적으로 최첨단 방법을 능가함을 입증합니다. 더불어, 우리의 접근 방식은 단일 이미지 조명 재구성, 조도 스테레오 및 3D 재구성을 포함한 다양한 하향 작업을 용이하게 하며 현실적인 3D 콘텐츠 작성 분야에서의 폭넓은 응용 가능성을 강조합니다.

English

Capturing geometric and material information from images remains a fundamental challenge in computer vision and graphics. Traditional optimization-based methods often require hours of computational time to reconstruct geometry, material properties, and environmental lighting from dense multi-view inputs, while still struggling with inherent ambiguities between lighting and material. On the other hand, learning-based approaches leverage rich material priors from existing 3D object datasets but face challenges with maintaining multi-view consistency. In this paper, we introduce IDArb, a diffusion-based model designed to perform intrinsic decomposition on an arbitrary number of images under varying illuminations. Our method achieves accurate and multi-view consistent estimation on surface normals and material properties. This is made possible through a novel cross-view, cross-domain attention module and an illumination-augmented, view-adaptive training strategy. Additionally, we introduce ARB-Objaverse, a new dataset that provides large-scale multi-view intrinsic data and renderings under diverse lighting conditions, supporting robust training. Extensive experiments demonstrate that IDArb outperforms state-of-the-art methods both qualitatively and quantitatively. Moreover, our approach facilitates a range of downstream tasks, including single-image relighting, photometric stereo, and 3D reconstruction, highlighting its broad applications in realistic 3D content creation.

IDArb: 임의의 입력 뷰와 조명에 대한 내재적 분해

IDArb: Intrinsic Decomposition for Arbitrary Number of Input Views and Illuminations

초록

Support