Continuous Diffusion for Mixed-Type Tabular Data

Score-based generative models (or diffusion models for short) have proven successful for generating text and image data. However, the adaption of this model family to tabular data of mixed-type has fallen short so far. In this paper, we propose CDTD, a Continuous Diffusion model for mixed-type Tabular Data. Specifically, we combine score matching and score interpolation to ensure a common continuous noise distribution for both continuous and categorical features alike. We counteract the high heterogeneity inherent to data of mixed-type with distinct, adaptive noise schedules per feature or per data type. The learnable noise schedules ensure optimally allocated model capacity and balanced generative capability. We homogenize the data types further with model-specific loss calibration and initialization schemes tailored to mixed-type tabular data. Our experimental results show that CDTD consistently outperforms state-of-the-art benchmark models, captures feature correlations exceptionally well, and that heterogeneity in the noise schedule design boosts the sample quality.

翻译：基于分数的生成模型（简称扩散模型）已在文本和图像数据生成方面取得显著成功。然而，该模型家族在适应混合类型表格数据方面迄今仍存在不足。本文提出CDTD，一种面向混合类型表格数据的连续扩散模型。具体而言，我们结合分数匹配与分数插值技术，为连续特征和分类特征确保共同的连续噪声分布。针对混合类型数据固有的高度异质性，我们采用按特征或按数据类型区分的自适应噪声调度机制加以应对。可学习的噪声调度确保了模型容量的最优分配与生成能力的均衡发展。通过针对混合类型表格数据定制的模型特定损失校准与初始化方案，我们进一步实现了数据类型的同质化处理。实验结果表明，CDTD在各项评估中持续超越最先进的基准模型，能出色捕捉特征相关性，且噪声调度设计的异质性显著提升了生成样本的质量。

相关内容

Continuity

关注 4

让 iOS 8 和 OS X Yosemite 无缝切换的一个新特性。 > Apple products have always been designed to work together beautifully. But now they may really surprise you. With iOS 8 and OS X Yosemite, you’ll be able to do more wonderful things than ever before.

Source: Apple - iOS 8

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日