The assumption of conditional independence among observed variables, primarily used in the Variational Autoencoder (VAE) decoder modeling, has limitations when dealing with high-dimensional datasets or complex correlation structures among observed variables. To address this issue, we introduced the Cramer-Wold distance regularization, which can be computed in a closed-form, to facilitate joint distributional learning for high-dimensional datasets. Additionally, we introduced a two-step learning method to enable flexible prior modeling and improve the alignment between the aggregated posterior and the prior distribution. Furthermore, we provide theoretical distinctions from existing methods within this category. To evaluate the synthetic data generation performance of our proposed approach, we conducted experiments on high-dimensional datasets with multiple categorical variables. Given that many readily available datasets and data science applications involve such datasets, our experiments demonstrate the effectiveness of our proposed methodology.
翻译:在变分自编码器解码器建模中常用的观测变量条件独立性假设,在处理高维数据集或观测变量间复杂相关结构时存在局限性。为解决此问题,我们引入了可闭式计算形式的Cramer-Wold距离正则化项,以促进高维数据集的联合分布学习。此外,我们提出了一种两步学习法,既能实现灵活的先验建模,又能改善聚合后验分布与先验分布的对齐效果。进一步地,我们从理论层面阐述了该方法与同类现有方法的差异。为评估所提方法的合成数据生成性能,我们在包含多个分类变量的高维数据集上进行了实验。鉴于许多常用数据集与数据科学应用均涉及此类数据,实验结果验证了我们所提方法的有效性。