Replicability and stability in learning

Replicability is essential in science as it allows us to validate and verify research findings. Impagliazzo, Lei, Pitassi and Sorrell (`22) recently initiated the study of replicability in machine learning. A learning algorithm is replicable if it typically produces the same output when applied on two i.i.d. inputs using the same internal randomness. We study a variant of replicability that does not involve fixing the randomness. An algorithm satisfies this form of replicability if it typically produces the same output when applied on two i.i.d. inputs (without fixing the internal randomness). This variant is called global stability and was introduced by Bun, Livni and Moran (`20) in the context of differential privacy. Impagliazzo et al. showed how to boost any replicable algorithm so that it produces the same output with probability arbitrarily close to 1. In contrast, we demonstrate that for numerous learning tasks, global stability can only be accomplished weakly, where the same output is produced only with probability bounded away from 1. To overcome this limitation, we introduce the concept of list replicability, which is equivalent to global stability. Moreover, we prove that list replicability can be boosted so that it is achieved with probability arbitrarily close to 1. We also describe basic relations between standard learning-theoretic complexity measures and list replicable numbers. Our results in addition imply that, besides trivial cases, replicable algorithms (in the sense of Impagliazzo et al.) must be randomized. The proof of the impossibility result is based on a topological fixed-point theorem. For every algorithm, we are able to locate a "hard input distribution" by applying the Poincar\'e-Miranda theorem in a related topological setting. The equivalence between global stability and list replicability is algorithmic.

翻译：可复制性在科学中至关重要，因为它使我们能够验证和确认研究结果。Impagliazzo、Lei、Pitassi和Sorrell（2022年）近期开创了机器学习中可复制性的研究。若一个学习算法在两次独立同分布输入上使用相同内部随机性时，通常产生相同输出，则该算法是可复制的。我们研究一种不固定随机性的可复制性变体。若算法在两次独立同分布输入上（不固定内部随机性）通常产生相同输出，则该算法满足这种形式的可复制性。这一变体称为全局稳定性，由Bun、Livni和Moran（2020年）在差分隐私背景下提出。Impagliazzo等人展示了如何增强任何可复制算法，使其以任意接近1的概率产生相同输出。相比之下，我们证明对于众多学习任务，全局稳定性只能弱实现，即相同输出的概率被限制在远离1的范围。为克服这一局限，我们引入列表可复制性概念，该概念与全局稳定性等价。此外，我们证明列表可复制性可以被增强，使其以任意接近1的概率实现。我们还描述了标准学习理论复杂度度量与列表可复制数之间的基本关系。我们的结果还表明，除平凡情况外，可复制算法（按Impagliazzo等人的定义）必须是随机化的。该不可能性结果的证明基于拓扑不动点定理。对于每个算法，我们通过将庞加莱-米兰达定理应用于相关拓扑设置，能够定位一个“困难输入分布”。全局稳定性与列表可复制性之间的等价性是算法性的。