In this paper, we propose Wasserstein proximals of $\alpha$-divergences as suitable objective functionals for learning heavy-tailed distributions in a stable manner. First, we provide sufficient, and in some cases necessary, relations among data dimension, $\alpha$, and the decay rate of data distributions for the Wasserstein-proximal-regularized divergence to be finite. Finite-sample convergence rates for the estimation in the case of the Wasserstein-1 proximal divergences are then provided under certain tail conditions. Numerical experiments demonstrate stable learning of heavy-tailed distributions -- even those without first or second moment -- without any explicit knowledge of the tail behavior, using suitable generative models such as GANs and flow-based models related to our proposed Wasserstein-proximal-regularized $\alpha$-divergences. Heuristically, $\alpha$-divergences handle the heavy tails and Wasserstein proximals allow non-absolute continuity between distributions and control the velocities of flow-based algorithms as they learn the target distribution deep into the tails.
翻译:本文提出将α散度的Wasserstein近端作为目标泛函,以稳定地学习重尾分布。首先,我们给出了数据维度、α参数与数据分布衰减率之间的充分性关系(在某些情况下是必要性关系),以确保Wasserstein近端正则化散度保持有限性。随后在特定尾部条件下,给出了Wasserstein-1近端散度估计的有限样本收敛速率。数值实验表明,通过采用与所提Wasserstein近端正则化α散度相关的生成模型(如GAN和基于流的模型),即使对不存在一阶或二阶矩的重尾分布,也能在无需显式了解尾部行为的情况下实现稳定学习。从启发式角度理解,α散度负责处理重尾特性,而Wasserstein近端则允许分布间的非绝对连续性,并在基于流的算法深入学习目标分布尾部时控制其更新速度。