In this paper, we present generalization bounds for the unsupervised risk in the Deep Contrastive Representation Learning framework, which employs deep neural networks as representation functions. We approach this problem from two angles. On the one hand, we derive a parameter-counting bound that scales with the overall size of the neural networks. On the other hand, we provide a norm-based bound that scales with the norms of neural networks' weight matrices. Ignoring logarithmic factors, the bounds are independent of $k$, the size of the tuples provided for contrastive learning. To the best of our knowledge, this property is only shared by one other work, which employed a different proof strategy and suffers from very strong exponential dependence on the depth of the network which is due to a use of the peeling technique. Our results circumvent this by leveraging powerful results on covering numbers with respect to uniform norms over samples. In addition, we utilize loss augmentation techniques to further reduce the dependency on matrix norms and the implicit dependence on network depth. In fact, our techniques allow us to produce many bounds for the contrastive learning setting with similar architectural dependencies as in the study of the sample complexity of ordinary loss functions, thereby bridging the gap between the learning theories of contrastive learning and DNNs.
翻译:本文针对采用深度神经网络作为表示函数的深度对比表示学习框架,提出了无监督风险的泛化界。我们从两个角度探讨此问题。一方面,我们推导出与神经网络总体规模成比例的参数计数界;另一方面,我们给出了与神经网络权重矩阵范数成比例的范数界。忽略对数因子,这些界与对比学习提供的元组大小 $k$ 无关。据我们所知,目前仅有一项其他工作具有此性质,但其采用了不同的证明策略,且由于使用了剥离技术,导致对网络深度存在极强的指数依赖。我们的结果通过利用关于样本均匀范数的覆盖数强大结论规避了这一问题。此外,我们采用损失增强技术进一步降低了对矩阵范数的依赖以及对网络深度的隐式依赖。实际上,我们的技术使我们能够为对比学习场景导出多种边界,其架构依赖性与普通损失函数样本复杂度的研究相似,从而弥合了对比学习与深度神经网络学习理论之间的差距。