Deep learning in computer-aided design (CAD) remains fundamentally constrained by the data scarcity challenge: authentic CAD data is difficult to collect at scale, while synthetic data may not faithfully reflect real design practice. Rather than pursuing ever-larger CAD datasets, this paper alternatively treats CAD learning as a knowledge completion and calibration problem. It introduces KDH-CAD, a knowledge-data hybrid framework that integrates pretrained knowledge in foundation models, structured domain knowledge from textbooks/tutorials, and a very small amount of labeled CAD data. Domain knowledge is used to elicit and complete CAD-relevant concepts that are weakly expressed or under-represented in pretrained foundation models, while labeled CAD data calibrates these concepts in the latent space to account for task-specific geometric variability, without fine-tuning the foundation model. Experiments on real-world mechanical part classification show that KDH-CAD achieves strong performance in low-data regimes, reaching 92.6\% accuracy with only 250 training samples, 95.8\% with 1,000 samples, and continuing to improve with additional data. This matches or exceeds state-of-the-art performance that typically requires an order of magnitude more data. These results suggest that combining pretrained foundation models with structured domain knowledge can substantially reduce reliance on large-scale CAD datasets, providing a principled and practical direction for data-efficient CAD learning.
翻译:深度学习在计算机辅助设计(CAD)领域始终面临数据稀缺这一根本性挑战:真实CAD数据难以大规模采集,而合成数据又可能无法忠实反映真实设计实践。本文并未追求更大规模的CAD数据集,而是将CAD学习视为知识补全与校准问题。我们提出KDH-CAD,这是一种知识-数据混合框架,整合了预训练基础模型中的知识、教科书/教程中的结构化领域知识,以及极少量标注CAD数据。利用领域知识来激发并补全预训练基础模型中表达较弱或表征不足的CAD相关概念,同时通过标注CAD数据在潜在空间中校准这些概念以考虑任务特定的几何变异性,而无需对基础模型进行微调。在实际机械零件分类上的实验表明,KDH-CAD在低数据场景下表现出色:仅用250个训练样本即可达到92.6%的准确率,用1000个样本达到95.8%,且随着数据增加性能持续提升。这达到或超过了通常需要数量级更多数据的最先进性能。这些结果表明,将预训练基础模型与结构化领域知识相结合,可大幅降低对大规模CAD数据集的依赖,为数据高效的CAD学习提供了有原则且实用的方向。