Diffusion probabilistic models have achieved remarkable success in text guided image generation. However, generating 3D shapes is still challenging due to the lack of sufficient data containing 3D models along with their descriptions. Moreover, text based descriptions of 3D shapes are inherently ambiguous and lack details. In this paper, we propose a sketch and text guided probabilistic diffusion model for colored point cloud generation that conditions the denoising process jointly with a hand drawn sketch of the object and its textual description. We incrementally diffuse the point coordinates and color values in a joint diffusion process to reach a Gaussian distribution. Colored point cloud generation thus amounts to learning the reverse diffusion process, conditioned by the sketch and text, to iteratively recover the desired shape and color. Specifically, to learn effective sketch-text embedding, our model adaptively aggregates the joint embedding of text prompt and the sketch based on a capsule attention network. Our model uses staged diffusion to generate the shape and then assign colors to different parts conditioned on the appearance prompt while preserving precise shapes from the first stage. This gives our model the flexibility to extend to multiple tasks, such as appearance re-editing and part segmentation. Experimental results demonstrate that our model outperforms recent state-of-the-art in point cloud generation.
翻译:扩散概率模型在文本引导的图像生成中取得了显著成功。然而,由于包含三维模型及其描述的足够数据匮乏,三维形状生成仍具挑战性。此外,基于文本的三维形状描述本质上是模糊且缺乏细节的。本文提出一种基于草图和文本引导的概率扩散模型,用于彩色点云生成,该模型将手绘物体草图及其文本描述联合作为去噪过程的约束条件。我们通过联合扩散过程逐步将点坐标和颜色值扩散至高斯分布,因此彩色点云生成可归结为学习由草图和文本约束的逆向扩散过程,以迭代恢复目标形状与颜色。具体而言,为学习有效的草图-文本嵌入,我们的模型基于胶囊注意力网络自适应聚合文本提示与草图的联合嵌入。该模型采用分阶段扩散策略:首阶段生成精确形状,次阶段基于外观提示为不同区域赋予颜色,同时保持首阶段获取的精准形状。这赋予模型扩展至多种任务的灵活性,如外观重编辑和部件分割。实验结果表明,本模型在点云生成任务上优于现有最新方法。