We introduce Discrete Voxel Diffusion (DVD), a discrete diffusion framework to generate, assess, and edit sparse voxels for SLat (Structured LATent) based 3D generative pipelines. Although discrete diffusion has not generally displaced continuous diffusion in image-like generation, we show that it can be an effective first-stage prior for sparse voxel scaffolds. By treating voxel occupancy as a native discrete variable, DVD avoids continuous-to-discrete thresholding and provides a simple framework for voxel generation, uncertainty estimation, and editing. Beyond quality gains, DVD provides more interpretable generation dynamics through explicit categorical modeling. Furthermore, we leverage the predictive entropy as a robust uncertainty metric to identify ambiguous voxel regions and complicated samples, facilitating tasks such as data filtering and quality assessment. Finally, we propose a lightweight fine-tuning strategy using block-structured perturbation patterns. This approach empowers the model to inpaint and edit voxels within a single sampling round, requiring negligible auxiliary computation and no additional model evaluations. Code is available at https://github.com/TeCai/DVD.
翻译:我们提出离散体素扩散(DVD)——一种用于生成、评估和编辑稀疏体素的离散扩散框架,适用于基于SLat(结构化隐式张量)的三维生成管线。尽管在类图像生成任务中离散扩散尚未普遍取代连续扩散,但研究表明,离散扩散可作为稀疏体素支架的有效第一阶段先验。通过将体素占用率视作原生离散变量,DVD避免了连续到离散的阈值化过程,并提供了体素生成、不确定性估计与编辑的简洁框架。除质量提升外,DVD通过显式类别建模实现了更具可解释性的生成动态。此外,我们利用预测熵作为稳健的不确定性度量,以识别模糊体素区域与复杂样本,进而支持数据筛选与质量评估等任务。最后,我们提出一种基于分块扰动模式的轻量微调策略。该方法使模型能够在单次采样回合内完成体素修补与编辑,所需辅助计算量可忽略且无需额外模型评估。代码见https://github.com/TeCai/DVD。