Copulas are a fundamental tool for modelling multivariate dependencies in data, forming the method of choice in diverse fields and applications. However, the adoption of existing models for multimodal and high-dimensional dependencies is hindered by restrictive assumptions and poor scaling. In this work, we present methods for modelling copulas based on the principles of diffusions and flows. We design two processes that progressively forget inter-variable dependencies while leaving dimension-wise distributions unaffected, provably defining valid copulas at all times. We show how to obtain copula models by learning to remember the forgotten dependencies from each process, theoretically recovering the true copula at optimality. The first instantiation of our framework focuses on direct density estimation, while the second specialises in expedient sampling. Empirically, we demonstrate the superior performance of our proposed methods over state-of-the-art copula approaches in modelling complex and high-dimensional dependencies from scientific datasets and images. Our work enhances the representational power of copula models, empowering applications and paving the way for their adoption on larger scales and more challenging domains.
翻译:连接函数是建模数据中多变量依赖关系的基础工具,在众多领域和应用中构成首选方法。然而,现有模型在多模态与高维依赖关系的应用中受限于严苛的假设和较差的扩展性。本研究提出基于扩散与流原理的连接函数建模方法,设计了两个逐步遗忘变量间依赖关系、同时保持各维度分布不变的过程,并严格证明了其在任意时刻均能定义有效连接函数。通过从每个过程中学习恢复遗忘的依赖关系,我们展示了如何获得连接函数模型,在最优条件下理论上能够恢复真实连接函数。该框架的第一个实例侧重于直接密度估计,第二个实例专注于高效采样。实验结果表明,在科学数据集和图像中建模复杂高维依赖关系时,我们提出的方法在性能上显著优于当前最先进的连接函数方法。本研究增强了连接函数模型的表征能力,推动了相关应用的发展,为在更大规模和更具挑战性的领域中推广应用奠定了基础。