Score-based diffusion models have emerged as prominent deep generative models; however, their application to tabular data remains challenging because their backbones assume fully specified inputs, whereas real-world tabular data often contain missing values. We propose AugMask, a plug-and-play training framework that adapts missing-unaware backbones to incomplete data by separating conditioning from supervision. AugMask 1) constructs numeric inputs via conditional stochastic augmentation using lightweight auxiliary models, and 2) applies denoising supervision only to observed coordinates. In effect, augmented missing entries serve as uncertain conditioning context rather than training targets. We connect this training rule to a Rao--Blackwellized objective and show that marginalizing missing entries yields a variance-weighted sensitivity penalty, discouraging over-reliance on uncertain completions. Across diverse datasets and missingness regimes, AugMask enables standard diffusion-based tabular generators to outperform specialized missing-aware baselines.
翻译:基于分数的扩散模型已成为重要的深度生成模型,但其在表格数据上的应用仍具挑战性——模型主干假设输入完全指定,而现实表格数据常含缺失值。我们提出AugMask,一种即插即用的训练框架,通过分离条件化与监督信号,使对缺失不敏感的主干适应不完整数据。AugMask通过以下两点实现:1)利用轻量级辅助模型进行条件随机增强以构建数值型输入;2)仅对观测坐标施加去噪监督。实际上,增强后缺失条目充当不确定的条件化背景而非训练目标。我们将此训练规则关联至Rao-Blackwell化目标,并证明边缘化缺失条目会产生方差加权灵敏度惩罚项,从而抑制对不确定补全结果的过度依赖。跨多种数据集与缺失机制,AugMask使基于标准扩散的表格生成器优于特制的缺失感知基线模型。