Parametric computer-aided design records both final geometry and the ordered construction history that determines how a part can be edited. Datasets for editable CAD research should therefore expose modeling operations, parameters, and feature dependencies together with validated geometry. We introduce FllumaOne, a code-native multimodal CAD dataset whose models are generated by executable Python programs in Flluma, a Qt/C++ OpenCASCADE-based CAD system. Each sample aligns its program with a structured feature tree, a training-oriented intermediate representation, STEP geometry, a surface point cloud, natural-language descriptions, metadata, and eight canonical visible-edge renderings. The primary release, FllumaOne-100K, contains 100,000 accepted samples across four template-level complexity regimes. Programs are executed and retained only after kernel geometry, solid validity, and export checks; release reports also record modality completeness and split-level duplicate tests. A Qwen2.5-Coder-1.5B LoRA baseline trained on 80,000 samples achieves 99.98% Python syntax validity, 99.97% Flluma build success, and 99.14% STEP-export validity on the held-out 10,000-sample test split. For the 9,909 predictions converted to surface point clouds, the mean normalized Chamfer Distance is 0.002124. The dataset supports conditioned CAD reconstruction, executable program synthesis, feature-tree prediction, B-Rep analysis, retrieval, design completion, and editable reverse engineering.
翻译:参数化计算机辅助设计不仅记录最终几何形状,还记录决定零件可编辑方式的有序构建历史。因此,面向可编辑CAD研究的数据集应提供建模操作、参数和特征依赖关系,同时包含经过验证的几何数据。我们提出FllumaOne,一种代码原生多模态CAD数据集,其模型由Flluma(一种基于Qt/C++ OpenCASCADE的CAD系统)中的可执行Python程序生成。每个样本将其程序与结构化特征树、面向训练的中间表示、STEP几何文件、表面点云、自然语言描述、元数据以及八幅规范可见边渲染图对齐。主要发布版本FllumaOne-100K包含100,000个已接受样本,涵盖四种模板级复杂度范围。程序在执行后仅当通过内核几何、实体有效性及导出检查时才被保留;发布报告同时记录了模态完整性与分层次重复性测试。基于Qwen2.5-Coder-1.5B LoRA的基线模型在80,000个样本上训练后,于保留的10,000样本测试集上实现了99.98%的Python语法有效性、99.97%的Flluma构建成功率和99.14%的STEP导出有效性。在转换为表面点云的9,909个预测结果中,平均归一化倒角距离为0.002124。该数据集支持条件化CAD重建、可执行程序合成、特征树预测、B-Rep分析、检索、设计补全及可编辑逆向工程。