When two different parties use the same learning rule on their own data, how can we test whether the distributions of the two outcomes are similar? In this paper, we study the similarity of outcomes of learning rules through the lens of the Total Variation (TV) distance of distributions. We say that a learning rule is TV indistinguishable if the expected TV distance between the posterior distributions of its outputs, executed on two training data sets drawn independently from the same distribution, is small. We first investigate the learnability of hypothesis classes using TV indistinguishable learners. Our main results are information-theoretic equivalences between TV indistinguishability and existing algorithmic stability notions such as replicability and approximate differential privacy. Then, we provide statistical amplification and boosting algorithms for TV indistinguishable learners.
翻译:当两个不同的参与方各自使用相同的学习规则处理自己的数据时,我们如何检验两个输出结果的分布是否相似?本文从分布的总变差距离视角研究学习规则输出结果的相似性。若一个学习规则在两个独立同分布的训练数据集上运行时,其输出后验分布之间的期望总变差距离较小,则称之为总变差不可区分的学习规则。我们首先探究使用总变差不可区分学习器对假设类进行可学习性的问题。主要研究成果揭示了总变差不可区分性与现有算法稳定性概念(如可复现性和近似差分隐私)之间的信息论等价关系。随后,我们提出了面向总变差不可区分学习器的统计放大与提升算法。