Normalizing flows are powerful non-parametric statistical models that function as a hybrid between density estimators and generative models. Current learning algorithms for normalizing flows assume that data points are sampled independently, an assumption that is frequently violated in practice, which may lead to erroneous density estimation and data generation. We propose a likelihood objective of normalizing flows incorporating dependencies between the data points, for which we derive a flexible and efficient learning algorithm suitable for different dependency structures. We show that respecting dependencies between observations can improve empirical results on both synthetic and real-world data, and leads to higher statistical power in a downstream application to genome-wide association studies.
翻译:归一化流是强大的非参数统计模型,兼具密度估计与生成模型的双重功能。当前归一化流的学习算法假设数据点独立采样,但这一假设在实际应用中常被违背,可能导致密度估计与数据生成出现错误。我们提出了一种包含数据点间依赖关系的归一化流似然目标函数,并针对不同依赖结构推导出灵活高效的学习算法。研究表明,在合成数据与真实数据上,尊重观测值间的依赖关系可提升实证结果,并在全基因组关联分析的下游应用中带来更高的统计功效。