The ratio of two probability density functions is a fundamental quantity that appears in many areas of statistics and machine learning, including causal inference, reinforcement learning, covariate shift, outlier detection, independence testing, importance sampling, and diffusion modeling. Naively estimating the numerator and denominator densities separately using, e.g., kernel density estimators, can lead to unstable performance and suffers from the curse of dimensionality as the number of covariates increases. For this reason, several methods have been developed for estimating the density ratio directly based on (a) Bregman divergences or (b) recasting the density ratio as the odds in a probabilistic classification model that predicts whether an observation is sampled from the numerator or denominator distribution. Additionally, the density ratio can be viewed as the Riesz representer of a continuous linear map, making it amenable to estimation via (c) minimization of the so-called Riesz loss, which was developed to learn the Riesz representer in the Riesz regression procedure in causal inference. In this paper we show that all three of these methods can be unified in a common framework, which we call Bregman-Riesz regression. We further show how data augmentation techniques can be used to apply density ratio learning methods to causal problems, where the numerator distribution typically represents an unobserved intervention. We show through simulations how the choice of Bregman divergence and data augmentation strategy can affect the performance of the resulting density ratio learner. A Python package is provided for researchers to apply Bregman-Riesz regression in practice using gradient boosting, neural networks, and kernel methods.
翻译:概率密度函数之比是一个基本量,广泛出现于统计学与机器学习的诸多领域,包括因果推断、强化学习、协变量偏移、异常检测、独立性检验、重要性采样以及扩散模型。若采用核密度估计等方法分别估计分子与分母密度,可能导致性能不稳定,且随着协变量数量增加会遭受维度灾难。为此,已发展出多种直接估计密度比的方法,其基础包括:(a) Bregman散度,或(b) 将密度比重构为概率分类模型中的比值(该模型预测观测样本来自分子分布或分母分布)。此外,密度比可视为连续线性映射的Riesz表示子,从而可通过(c) 最小化所谓的Riesz损失进行估计——该方法最初为学习因果推断中Riesz回归过程的Riesz表示子而提出。本文证明,这三种方法均可统一于一个共同框架,我们称之为Bregman-Riesz回归。我们进一步展示了如何利用数据增强技术将密度比学习方法应用于因果问题(其中分子分布通常代表未观测的干预)。通过模拟实验,我们揭示了Bregman散度的选择与数据增强策略如何影响最终密度比学习器的性能。本文提供了一个Python软件包,支持研究者使用梯度提升、神经网络及核方法在实践中应用Bregman-Riesz回归。