In this paper, we present generalization bounds for the unsupervised risk in the Deep Contrastive Representation Learning framework, which employs deep neural networks as representation functions. We approach this problem from two angles. On the one hand, we derive a parameter-counting bound that scales with the overall size of the neural networks. On the other hand, we provide a norm-based bound that scales with the norms of neural networks' weight matrices. Ignoring logarithmic factors, the bounds are independent of $k$, the size of the tuples provided for contrastive learning. To the best of our knowledge, this property is only shared by one other work, which employed a different proof strategy and suffers from very strong exponential dependence on the depth of the network which is due to a use of the peeling technique. Our results circumvent this by leveraging powerful results on covering numbers with respect to uniform norms over samples. In addition, we utilize loss augmentation techniques to further reduce the dependency on matrix norms and the implicit dependence on network depth. In fact, our techniques allow us to produce many bounds for the contrastive learning setting with similar architectural dependencies as in the study of the sample complexity of ordinary loss functions, thereby bridging the gap between the learning theories of contrastive learning and DNNs.
翻译:本文针对采用深度神经网络作为表示函数的深度对比表示学习框架,提出了无监督风险的泛化界。我们从两个角度探讨此问题。一方面,我们推导出与神经网络总体规模成比例的参数计数界;另一方面,我们给出了与神经网络权重矩阵范数成比例的范数界。忽略对数因子,这些界与对比学习提供的元组大小 $k$ 无关。据我们所知,仅有另一项工作具备此特性,但其采用不同的证明策略,且因使用剥离技术而导致对网络深度存在极强的指数依赖。我们的结果通过利用关于样本均匀范数的覆盖数强大结论规避了这一问题。此外,我们采用损失增强技术进一步降低对矩阵范数的依赖及对网络深度的隐式依赖。事实上,我们的技术使我们能为对比学习场景构建多个泛化界,其架构依赖性与普通损失函数样本复杂度的研究具有相似性,从而弥合了对比学习与深度神经网络学习理论之间的鸿沟。