ChatPaper.aiChatPaper

通过在大型语言模型中注入视觉反馈实现文本到CAD的生成

Text-to-CAD Generation Through Infusing Visual Feedback in Large Language Models

January 31, 2025
作者: Ruiyu Wang, Yu Yuan, Shizhao Sun, Jiang Bian
cs.AI

摘要

创建计算机辅助设计(CAD)模型需要相当的专业知识和努力。将文本转换为CAD,将文本描述转换为CAD参数序列,在简化这一过程中至关重要。最近的研究已经利用了称为顺序信号的地面实况参数序列作为监督来实现这一目标。然而,CAD模型本质上是多模态的,包括参数序列和相应的渲染视觉对象。此外,从参数序列到视觉对象的渲染过程是多对一的。因此,序列信号和视觉信号对于有效训练至关重要。在这项工作中,我们介绍CADFusion,这是一个使用大型语言模型(LLMs)作为骨干,并在顺序学习(SL)阶段和视觉反馈(VF)阶段之间交替的框架。在SL阶段,我们使用地面实况参数序列训练LLMs,使其能够生成逻辑连贯的参数序列。在VF阶段,我们奖励将渲染为视觉上优选对象的参数序列,并惩罚那些没有的,使LLMs学会如何感知和评估渲染的视觉对象。这两个阶段在训练过程中交替进行,确保平衡学习并保留两种信号的优势。实验证明,CADFusion显著提高了性能,无论是在质量上还是在数量上。
English
Creating Computer-Aided Design (CAD) models requires significant expertise and effort. Text-to-CAD, which converts textual descriptions into CAD parametric sequences, is crucial in streamlining this process. Recent studies have utilized ground-truth parametric sequences, known as sequential signals, as supervision to achieve this goal. However, CAD models are inherently multimodal, comprising parametric sequences and corresponding rendered visual objects. Besides,the rendering process from parametric sequences to visual objects is many-to-one. Therefore, both sequential and visual signals are critical for effective training. In this work, we introduce CADFusion, a framework that uses Large Language Models (LLMs) as the backbone and alternates between two training stages: the sequential learning (SL) stage and the visual feedback (VF) stage. In the SL stage, we train LLMs using ground-truth parametric sequences, enabling the generation of logically coherent parametric sequences. In the VF stage, we reward parametric sequences that render into visually preferred objects and penalize those that do not, allowing LLMs to learn how rendered visual objects are perceived and evaluated. These two stages alternate throughout the training, ensuring balanced learning and preserving benefits of both signals. Experiments demonstrate that CADFusion significantly improves performance, both qualitatively and quantitatively.

Summary

AI-Generated Summary

PDF102February 6, 2025