Current diffusion or flow-based generative models for 3D shapes divide to two: distilling pre-trained 2D image diffusion models, and training directly on 3D shapes. When training a diffusion or flow models on 3D shapes a crucial design choice is the shape representation. An effective shape representation needs to adhere three design principles: it should allow an efficient conversion of large 3D datasets to the representation form; it should provide a good tradeoff of approximation power versus number of parameters; and it should have a simple tensorial form that is compatible with existing powerful neural architectures. While standard 3D shape representations such as volumetric grids and point clouds do not adhere to all these principles simultaneously, we advocate in this paper a new representation that does. We introduce Mosaic-SDF (M-SDF): a simple 3D shape representation that approximates the Signed Distance Function (SDF) of a given shape by using a set of local grids spread near the shape's boundary. The M-SDF representation is fast to compute for each shape individually making it readily parallelizable; it is parameter efficient as it only covers the space around the shape's boundary; and it has a simple matrix form, compatible with Transformer-based architectures. We demonstrate the efficacy of the M-SDF representation by using it to train a 3D generative flow model including class-conditioned generation with the 3D Warehouse dataset, and text-to-3D generation using a dataset of about 600k caption-shape pairs.
翻译:当前基于扩散或流的三维形状生成模型分为两类:一类是从预训练的二维图像扩散模型中蒸馏知识,另一类是直接在三维形状上训练。在三维形状上训练扩散或流模型时,形状表示是一个关键设计选择。有效的形状表示需遵循三个设计原则:应能将大型三维数据集高效转换为该表示形式;应在逼近能力与参数数量之间实现良好权衡;应具有与现有强大神经架构兼容的简单张量形式。尽管标准三维形状表示(如体素网格和点云)无法同时满足所有这些原则,本文提出了一种兼顾三者的新表示。我们引入Mosaic-SDF(M-SDF):一种通过使用分布在形状边界附近的局部网格集来近似给定形状有符号距离函数的简单三维形状表示。M-SDF表示对每个形状的计算速度快,易于并行化;其参数高效,仅覆盖形状边界附近的空间;且具有简单的矩阵形式,可与基于Transformer的架构兼容。我们通过使用M-SDF表示训练三维生成流模型(包括基于3D Warehouse数据集的类别条件生成,以及利用约60万对图文对进行文本到三维生成)验证了其有效性。