DeepJEB++: Foundation Model-Driven Large-Scale 3D Engineering Dataset via 2D Latent Space Augmentation

Data-driven engineering design is constrained by the lack of large-scale 3D datasets that pair geometry with physics-based performance labels. In particular, existing 3D data augmentation techniques have limitations in preserving subtle and diverse geometric variations, and it remains difficult to automate the subsequent simulation-labeling process, where boundary conditions vary depending on the generated geometry. We present DeepJEB++, a foundation-model-driven data-augmentation framework that expands a small seed set of jet engine brackets into a large, simulation-labeled 3D dataset under constrained resources. Our key idea is to augment in the data-rich 2D latent space, then transfer to 3D. In Stage 1, we fine-tune a pretrained 2D latent diffusion model on multi-view renders and synthesize novel views by latent interpolation, retaining manufacturable designs through a vision-language-model (VLM) quality filter. In Stage 2, the validated images are lifted to 3D meshes by a domain-adapted generative foundation model. In Stage 3, an automated pipeline recognizes the load and bolt interfaces on each mesh and assigns finite-element labels -- mass, stress, and displacement -- without manual intervention. We assess augmentation quality along three intrinsic axes: manufacturability, label fidelity against the SimJEB ground truth, and distributional consistency. Starting from fewer than 400 seed designs, DeepJEB++ yields 15,360 simulation-labeled 3D brackets -- a 40x expansion -- using a single GPU per stage. The dataset will be made publicly available to support reproducible engineering-AI research.

翻译：数据驱动的工程设计受限于缺乏同时包含几何形状与物理性能标签的大规模三维数据集。具体而言，现有三维数据增广技术在保持细微且多样的几何变化方面存在局限性，且后续随生成几何形状变化的边界条件自动化仿真标注过程仍难以实现。我们提出DeepJEB++框架——一种基于基础模型的数据增广方法，能在有限资源约束下将小规模喷气发动机支架种子数据集扩展为包含仿真标签的大规模三维数据集。其核心思想是在数据丰富的二维潜空间进行增广，再迁移至三维空间。第一阶段，我们在多视角渲染图上微调预训练的二维潜扩散模型，通过潜空间插值合成新视角，并利用视觉语言模型（VLM）质量过滤器保留可制造设计。第二阶段，通过领域适配的生成式基础模型将验证后的图像提升为三维网格。第三阶段，自动化流水线识别每个网格上的载荷与螺栓接口，无需人工干预即可分配有限元标签——质量、应力与位移。我们从三个内在维度评估增广质量：可制造性、与SimJEB基准真值的标签保真度及分布一致性。基于不到400个种子设计，每阶段仅需单GPU即可生成15,360个带仿真标签的三维支架（40倍扩展）。该数据集将公开以支持可复现的工程人工智能研究。