Generating executable CAD programs from images requires alignment between visual geometry and symbolic program representations, a capability that current methods fail to learn reliably as design complexity increases. Existing fine-tuning approaches rely on either limited supervised datasets or expensive post-training pipelines, resulting in brittle systems that restrict progress in generative CAD design. We argue that the primary bottleneck lies not in model or algorithmic capacity, but in the scarcity of diverse training examples that align visual geometry with program syntax. This limitation is especially acute because the collection of diverse and verified engineering datasets is both expensive and difficult to scale, constraining the development of robust generative CAD models. We introduce Geometric Inference Feedback Tuning (GIFT), a data augmentation framework that leverages geometric feedback to turn test-time compute into a bootstrapped set of high-quality training samples. GIFT combines two mechanisms: Soft-Rejection Sampling (GIFT-REJECT), which retains diverse high-fidelity programs beyond exact ground-truth matches, and Failure-Driven Augmentation (GIFT-FAIL), which converts near-miss predictions into synthetic training examples that improve robustness on challenging geometries. By amortizing inference-time search into the model parameters, GIFT captures the benefits of test-time scaling while reducing inference compute by 80%. It improves mean IoU by 12% over a strong supervised baseline and remains competitive with more complex multimodal systems, without requiring additional human annotation or specialized architectures.
翻译:从图像生成可执行的CAD程序需要视觉几何与符号程序表示之间的对齐,这一能力会随着设计复杂度的增加而难以可靠地习得。现有的微调方法要么依赖有限的监督数据集,要么依赖昂贵的后训练流程,导致生成的系统脆弱,制约了生成式CAD设计的发展。我们认为,主要瓶颈不在于模型或算法的能力,而在于缺乏能将视觉几何与程序语法对齐的多样化训练样本。这种限制尤为严重,因为收集多样化且经过验证的工程数据集既昂贵又难以规模化,从而限制了鲁棒生成式CAD模型的开发。我们提出几何推理反馈微调(Geometric Inference Feedback Tuning,GIFT),这是一种数据增强框架,利用几何反馈将测试时的计算转化为自举生成的高质量训练样本集。GIFT结合了两种机制:软拒绝采样(GIFT-REJECT),用于保留超出精确真实标注的多样化高保真程序;以及失败驱动增强(GIFT-FAIL),将接近匹配的预测转换为合成训练样本,从而提升在复杂几何体上的鲁棒性。通过将推理时的搜索摊销到模型参数中,GIFT既获得了测试时缩放的优势,又将推理计算量减少了80%。相比强监督基线,其平均交并比(IoU)提升了12%,且在不需额外人工标注或专用架构的情况下,与更复杂的多模态系统保持竞争力。