This paper presents PolyDiffuse, a novel structured reconstruction algorithm that transforms visual sensor data into polygonal shapes with Diffusion Models (DM), an emerging machinery amid exploding generative AI, while formulating reconstruction as a generation process conditioned on sensor data. The task of structured reconstruction poses two fundamental challenges to DM: 1) A structured geometry is a ``set'' (e.g., a set of polygons for a floorplan geometry), where a sample of $N$ elements has $N!$ different but equivalent representations, making the denoising highly ambiguous; and 2) A ``reconstruction'' task has a single solution, where an initial noise needs to be chosen carefully, while any initial noise works for a generation task. Our technical contribution is the introduction of a Guided Set Diffusion Model where 1) the forward diffusion process learns guidance networks to control noise injection so that one representation of a sample remains distinct from its other permutation variants, thus resolving denoising ambiguity; and 2) the reverse denoising process reconstructs polygonal shapes, initialized and directed by the guidance networks, as a conditional generation process subject to the sensor data. We have evaluated our approach for reconstructing two types of polygonal shapes: floorplan as a set of polygons and HD map for autonomous cars as a set of polylines. Through extensive experiments on standard benchmarks, we demonstrate that PolyDiffuse significantly advances the current state of the art and enables broader practical applications.
翻译:摘要:本文提出PolyDiffuse,一种新颖的结构化重建算法,该算法利用扩散模型(DM)将视觉传感器数据转化为多边形形状。扩散模型是当前生成式人工智能浪潮中的新兴技术,本文将重建问题形式化为以传感器数据为条件的生成过程。结构化重建任务对扩散模型提出两个根本性挑战:1)结构化几何本质上是“集合”(例如,平面图几何中的多边形集合),其中包含N个元素的样本具有N!种不同但等价的表示,这使得去噪过程高度模糊;2)“重建”任务具有唯一解,需要精心选择初始噪声,而生成任务中任意初始噪声均可工作。我们的技术贡献在于引入引导集扩散模型,其中:1)前向扩散过程通过学习引导网络来控制噪声注入,使得样本的某个表示与其排列变体保持区分,从而解决去噪歧义;2)反向去噪过程重建多边形形状,该过程由引导网络初始化并指导,作为以传感器数据为条件的条件生成过程。我们评估了该方法在两类多边形形状重建中的表现:作为多边形集合的平面图,以及作为折线集合的自动驾驶汽车高清地图。通过在标准基准上的大量实验,我们证明PolyDiffuse显著推动了当前最先进水平,并拓展了更广泛的实际应用。