We introduce a rank-statistic approximation of $f$-divergences that avoids explicit density-ratio estimation by working directly with the distribution of ranks. For a resolution parameter $K$, we map the mismatch between two univariate distributions $μ$ and $ν$ to a rank histogram on $\{ 0, \ldots, K\}$ and measure its deviation from uniformity via a discrete $f$-divergence, yielding a rank-statistic divergence estimator. We prove that the resulting estimator of the divergence is monotone in $K$, is always a lower bound of the true $f$-divergence, and we establish quantitative convergence rates for $K\to\infty$ under mild regularity of the quantile-domain density ratio. To handle high-dimensional data, we define the sliced rank-statistic $f$-divergence by averaging the univariate construction over random projections, and we provide convergence results for the sliced limit as well. We also derive finite-sample deviation bounds along with asymptotic normality results for the estimator. Finally, we empirically validate the approach by benchmarking against neural baselines and illustrating its use as a learning objective in generative modelling experiments.
翻译:我们提出一种$f$-散度的秩统计量近似方法,该方法通过直接处理秩的分布来避免显式的密度比估计。对于分辨率参数$K$,我们将两个单变量分布$μ$与$ν$之间的差异映射到$\{ 0, \ldots, K\}$上的秩直方图,并通过离散$f$-散度度量其与均匀分布的偏差,从而得到秩统计量散度估计量。我们证明所得散度估计量关于$K$具有单调性,且始终是真实$f$-散度的下界,并在分位数域密度比满足温和正则性条件下建立了$K\to\infty$时的定量收敛速率。为处理高维数据,我们通过随机投影上单变量构造的平均定义切片秩统计量$f$-散度,并给出了切片极限的收敛性结果。我们还推导了估计量的有限样本偏差界及渐近正态性结果。最后,我们通过神经网络基线的基准测试验证了该方法的有效性,并展示了其作为生成建模实验学习目标的应用价值。