Procedural Content Generation (PCG) is powerful in creating high-quality 3D contents, yet controlling it to produce desired shapes is difficult and often requires extensive parameter tuning. Inverse Procedural Content Generation aims to automatically find the best parameters under the input condition. However, existing sampling-based and neural network-based methods still suffer from numerous sample iterations or limited controllability. In this work, we present DI-PCG, a novel and efficient method for Inverse PCG from general image conditions. At its core is a lightweight diffusion transformer model, where PCG parameters are directly treated as the denoising target and the observed images as conditions to control parameter generation. DI-PCG is efficient and effective. With only 7.6M network parameters and 30 GPU hours to train, it demonstrates superior performance in recovering parameters accurately, and generalizing well to in-the-wild images. Quantitative and qualitative experiment results validate the effectiveness of DI-PCG in inverse PCG and image-to-3D generation tasks. DI-PCG offers a promising approach for efficient inverse PCG and represents a valuable exploration step towards a 3D generation path that models how to construct a 3D asset using parametric models.
翻译:程序化内容生成(PCG)在创建高质量三维内容方面具有强大能力,但控制其生成期望形状较为困难,通常需要大量参数调整。逆向程序化内容生成旨在根据输入条件自动寻找最优参数。然而,现有的基于采样和基于神经网络的方法仍面临采样迭代次数过多或可控性有限的问题。本文提出DI-PCG,一种新颖高效的通用图像条件逆向PCG方法。其核心是一个轻量级扩散Transformer模型,其中PCG参数被直接作为去噪目标,观测图像则作为控制参数生成的条件。DI-PCG兼具高效性与有效性:仅需760万网络参数和30 GPU小时训练,即可在参数精确恢复方面展现卓越性能,并对真实场景图像具有良好的泛化能力。定量与定性实验结果验证了DI-PCG在逆向PCG和图像到三维生成任务中的有效性。DI-PCG为高效逆向PCG提供了可行方案,并为探索基于参数化模型构建三维资产的生成路径迈出了重要一步。