Existing contrastive learning methods rely on pairwise sample contrast $z_x^\top z_{x'}$ to learn data representations, but the learned features often lack clear interpretability from a human perspective. Theoretically, it lacks feature identifiability and different initialization may lead to totally different features. In this paper, we study a new method named tri-factor contrastive learning (triCL) that involves a 3-factor contrast in the form of $z_x^\top S z_{x'}$, where $S=\text{diag}(s_1,\dots,s_k)$ is a learnable diagonal matrix that automatically captures the importance of each feature. We show that by this simple extension, triCL can not only obtain identifiable features that eliminate randomness but also obtain more interpretable features that are ordered according to the importance matrix $S$. We show that features with high importance have nice interpretability by capturing common classwise features, and obtain superior performance when evaluated for image retrieval using a few features. The proposed triCL objective is general and can be applied to different contrastive learning methods like SimCLR and CLIP. We believe that it is a better alternative to existing 2-factor contrastive learning by improving its identifiability and interpretability with minimal overhead. Code is available at https://github.com/PKU-ML/Tri-factor-Contrastive-Learning.
翻译:现有对比学习方法依赖成对样本对比 $z_x^\top z_{x'}$ 来学习数据表示,但所学特征常缺乏人类视角下的清晰可解释性。理论上,该方法缺乏特征可识别性,不同初始化可能导致截然不同的特征。本文研究一种名为三因子对比学习(triCL)的新方法,该方法采用 $z_x^\top S z_{x'}$ 形式的三因子对比,其中 $S=\text{diag}(s_1,\dots,s_k)$ 是一个可学习对角矩阵,能自动捕捉每个特征的重要性。我们证明,通过这一简单扩展,triCL 不仅能获得消除随机性的可识别特征,还能得到更具可解释性的特征,这些特征可根据重要性矩阵 $S$ 排序。研究表明,高重要性特征通过捕捉类别共同特征具有良好的可解释性,并在使用少量特征进行图像检索时展现出优越性能。所提出的 triCL 目标函数具有通用性,可应用于 SimCLR 和 CLIP 等不同对比学习方法。我们相信,通过以最小代价提升可识别性和可解释性,triCL 是现有双因子对比学习的更优替代方案。代码已发布于 https://github.com/PKU-ML/Tri-factor-Contrastive-Learning。