Recently, contrastive learning has found impressive success in advancing the state of the art in solving various machine learning tasks. However, the existing generalization analysis is very limited or even not meaningful. In particular, the existing generalization error bounds depend linearly on the number $k$ of negative examples while it was widely shown in practice that choosing a large $k$ is necessary to guarantee good generalization of contrastive learning in downstream tasks. In this paper, we establish novel generalization bounds for contrastive learning which do not depend on $k$, up to logarithmic terms. Our analysis uses structural results on empirical covering numbers and Rademacher complexities to exploit the Lipschitz continuity of loss functions. For self-bounding Lipschitz loss functions, we further improve our results by developing optimistic bounds which imply fast rates in a low noise condition. We apply our results to learning with both linear representation and nonlinear representation by deep neural networks, for both of which we derive Rademacher complexity bounds to get improved generalization bounds.
翻译:最近,对比学习在推进各类机器学习任务的最新技术水平方面取得了令人瞩目的成功。然而,现有的泛化分析非常有限,甚至缺乏实际意义。具体而言,现有泛化误差界对负样本数量$k$呈线性依赖,而实践经验广泛表明,选择较大的$k$对于保证下游任务中对比学习的良好泛化是必要的。本文为对比学习建立了新的泛化界,该泛化界在忽略对数项的情况下不依赖于$k$。我们的分析利用经验覆盖数和Rademacher复杂度的结构结果,以利用损失函数的Lipschitz连续性。对于自界Lipschitz损失函数,我们通过开发乐观界进一步改进了结果,该乐观界在低噪声条件下实现了快速收敛率。我们将结果应用于线性表示和深度神经网络的非线性表示学习,针对这两种情形均推导了Rademacher复杂度界,从而获得改进的泛化界。