대규모 언어 모델에 시각적 피드백을 주입하여 CAD 생성하기

초록

컴퓨터 지원 설계(CAD) 모델을 생성하는 데에는 상당한 전문 지식과 노력이 필요합니다. 텍스트-대-CAD는 텍스트 설명을 CAD 매개 변수 시퀀스로 변환하여 이 프로세스를 간소화하는 데 중요합니다. 최근 연구에서는 이 목표를 달성하기 위해 순차적 신호로 알려진 ground-truth 매개 변수 시퀀스를 감독으로 활용해 왔습니다. 그러나 CAD 모델은 본질적으로 다중 모달이며, 매개 변수 시퀀스와 해당 렌더링된 시각적 객체로 구성됩니다. 또한, 매개 변수 시퀀스에서 시각적 객체로의 렌더링 과정은 다대일 관계입니다. 따라서 효과적인 학습을 위해 순차적 및 시각적 신호가 모두 중요합니다. 본 연구에서는 Large Language Models (LLMs)를 백본으로 사용하고 순차 학습(SL) 단계와 시각적 피드백(VF) 단계 사이를 번갈아 가며 하는 CADFusion 프레임워크를 소개합니다. SL 단계에서는 ground-truth 매개 변수 시퀀스를 사용하여 LLM을 학습시켜 논리적으로 일관된 매개 변수 시퀀스를 생성합니다. VF 단계에서는 시각적으로 선호되는 객체로 렌더링되는 매개 변수 시퀀스를 보상하고, 그렇지 않은 경우에는 벌점을 부여하여 LLM이 렌더링된 시각적 객체가 어떻게 인식되고 평가되는지 학습하게 합니다. 이 두 단계는 교대로 학습되어 균형 잡힌 학습을 보장하고 두 신호의 이점을 유지합니다. 실험 결과, CADFusion이 성능을 현저히 향상시킨다는 것을 질적으로나 양적으로 입증하였습니다.

English

Creating Computer-Aided Design (CAD) models requires significant expertise and effort. Text-to-CAD, which converts textual descriptions into CAD parametric sequences, is crucial in streamlining this process. Recent studies have utilized ground-truth parametric sequences, known as sequential signals, as supervision to achieve this goal. However, CAD models are inherently multimodal, comprising parametric sequences and corresponding rendered visual objects. Besides,the rendering process from parametric sequences to visual objects is many-to-one. Therefore, both sequential and visual signals are critical for effective training. In this work, we introduce CADFusion, a framework that uses Large Language Models (LLMs) as the backbone and alternates between two training stages: the sequential learning (SL) stage and the visual feedback (VF) stage. In the SL stage, we train LLMs using ground-truth parametric sequences, enabling the generation of logically coherent parametric sequences. In the VF stage, we reward parametric sequences that render into visually preferred objects and penalize those that do not, allowing LLMs to learn how rendered visual objects are perceived and evaluated. These two stages alternate throughout the training, ensuring balanced learning and preserving benefits of both signals. Experiments demonstrate that CADFusion significantly improves performance, both qualitatively and quantitatively.

대규모 언어 모델에 시각적 피드백을 주입하여 CAD 생성하기

Text-to-CAD Generation Through Infusing Visual Feedback in Large Language Models

초록

Support