The discrepancy between two independent samples \(X_1,\dots,X_n\) and \(Y_1,\dots,Y_n\) drawn from the same distribution on $\mathbb{R}^d$ typically has order \(O(\sqrt{n})\) even in one dimension. We give a simple online algorithm that reduces the discrepancy to \(O(\log^{2d} n)\) by discarding a small fraction of the points.
翻译:从$\mathbb{R}^d$上同一分布中抽取的两个独立样本\(X_1,\dots,X_n\)与\(Y_1,\dots,Y_n\)之间的差异,即使在一维情形下通常也具有\(O(\sqrt{n})\)的量级。本文提出一种简单的在线算法,通过剔除少量样本点,可将差异降低至\(O(\log^{2d} n)\)。