We introduce a rank-statistic approximation of $f$-divergences that avoids explicit density-ratio estimation by working directly with the distribution of ranks. For a resolution parameter $K$, we map the mismatch between two univariate distributions $μ$ and $ν$ to a rank histogram on $\{ 0, \ldots, K\}$ and measure its deviation from uniformity via a discrete $f$-divergence, yielding a rank-statistic divergence estimator. We prove that the resulting estimator of the divergence is monotone in $K$, is always a lower bound of the true $f$-divergence, and we establish quantitative convergence rates for $K\to\infty$ under mild regularity of the quantile-domain density ratio. To handle high-dimensional data, we define the sliced rank-statistic $f$-divergence by averaging the univariate construction over random projections, and we provide convergence results for the sliced limit as well. We also derive finite-sample deviation bounds along with asymptotic normality results for the estimator. Finally, we empirically validate the approach by benchmarking against neural baselines and illustrating its use as a learning objective in generative modeling experiments.
翻译:摘要:本文提出了一种基于秩统计量的 $f$-散度逼近方法,该方法通过直接处理秩的分布避免了显式的密度比估计。对于分辨率参数 $K$,我们将两个单变量分布 $\mu$ 和 $\nu$ 之间的差异映射到 $\{0, \ldots, K\}$ 上的秩直方图,并通过离散 $f$-散度度量其与均匀分布的偏差,从而得到一种秩统计量散度估计器。我们证明了该散度估计量关于 $K$ 单调递增,始终是真实 $f$-散度的下界,并在分位数域密度比的温和正则性条件下,建立了当 $K\to\infty$ 时的定量收敛速率。为处理高维数据,我们通过对随机投影上的单变量构造进行平均,定义了切片秩统计量 $f$-散度,并给出了其切片极限的收敛结果。我们还推导了该估计量的有限样本偏差界以及渐近正态性结果。最后,通过以神经网络基线方法为基准进行实证验证,并展示其在生成建模实验中作为学习目标的应用,验证了该方法的有效性。