Replicability and stability in learning

Replicability is essential in science as it allows us to validate and verify research findings. Impagliazzo, Lei, Pitassi and Sorrell (`22) recently initiated the study of replicability in machine learning. A learning algorithm is replicable if it typically produces the same output when applied on two i.i.d. inputs using the same internal randomness. We study a variant of replicability that does not involve fixing the randomness. An algorithm satisfies this form of replicability if it typically produces the same output when applied on two i.i.d. inputs (without fixing the internal randomness). This variant is called global stability and was introduced by Bun, Livni and Moran ('20) in the context of differential privacy. Impagliazzo et al. showed how to boost any replicable algorithm so that it produces the same output with probability arbitrarily close to 1. In contrast, we demonstrate that for numerous learning tasks, global stability can only be accomplished weakly, where the same output is produced only with probability bounded away from 1. To overcome this limitation, we introduce the concept of list replicability, which is equivalent to global stability. Moreover, we prove that list replicability can be boosted so that it is achieved with probability arbitrarily close to 1. We also describe basic relations between standard learning-theoretic complexity measures and list replicable numbers. Our results, in addition, imply that besides trivial cases, replicable algorithms (in the sense of Impagliazzo et al.) must be randomized. The proof of the impossibility result is based on a topological fixed-point theorem. For every algorithm, we are able to locate a "hard input distribution" by applying the Poincar\'{e}-Miranda theorem in a related topological setting. The equivalence between global stability and list replicability is algorithmic.

翻译：可复制性是科学研究的基石，因为它使我们能够验证和核实研究结果。Impagliazzo、Lei、Pitassi 和 Sorrell（'22）近期开创性地将可复制性研究引入机器学习领域。如果学习算法在两次独立同分布输入上使用相同的内部随机性时，通常能产生相同的输出，则该算法被认为是可复制的。我们研究了一种不涉及固定随机性的可复制性变体。如果算法在两次独立同分布输入上（不固定内部随机性）通常能产生相同的输出，则该算法满足这种形式的可复制性。这种变体被称为全局稳定性，由 Bun、Livni 和 Moran（'20）在差分隐私的背景下提出。Impagliazzo 等人证明了任何可复制算法都可以被提升，使得其以任意接近1的概率产生相同输出。相反，我们证明对于许多学习任务，全局稳定性只能以较弱的方式实现，即产生相同输出的概率被限制在远离1的范围内。为了克服这一限制，我们引入了列表可复制性的概念，它与全局稳定性等价。此外，我们证明列表可复制性可以被提升，使得它以任意接近1的概率实现。我们还描述了标准学习理论复杂度度量与列表可复制数之间的基本关系。此外，我们的结果表明，除了平凡情形外，可复制算法（在 Impagliazzo 等人的意义上）必须是随机化的。这一不可能性结果的证明基于拓扑不动点定理。对于每个算法，我们能够通过在一个相关的拓扑设定中应用庞加莱-米兰达定理来定位一个“困难输入分布”。全局稳定性与列表可复制性之间的等价性是算法可构造的。