Self-supervised learning (SSL) learns representations from massive unlabeled data, yet the resulting models typically operate as black boxes, necessitating domain-specific explanations. We introduce KREPES, a unified framework to analytically interpret the learned representations of SSL objectives, including SimCLR, BYOL, and VICReg. By bridging empirical neural tangent kernel approximations of neural networks with the Representer Theorem for kernels, we express the learned latent space directly via "Representer Landmarks", which are the representations of influential unlabeled training examples. We introduce novel metrics, "Sample-Specific Influence Score", "Concept-Conditioned Influence Score" and "Feature Alignment Gap", to quantify the transparency of the learned representations. KREPES enables direct audit of the latent space without supervision, for example, revealing an algorithmic bias in the Adult-1M dataset where SSL uses demographic proxies for income. Finally, to ensure scalability to benchmarks with 1M+ samples (ImageNet-1K, Adult-1M), KREPES introduces a novel Nyström approximation-based analytical inference framework for SSL objectives.
翻译:自监督学习(SSL)可从海量无标注数据中学习表征,但所得模型通常以黑箱形式运作,亟需特定领域的解释。我们提出统一框架KREPES,用于解析性解释包括SimCLR、BYOL和VICReg在内的SSL目标函数所习得的表征。通过将神经网络的经验神经正切核近似与核表征定理相衔接,我们直接借助"表征标志点"(即对训练样本集有影响力的无标注实例的表征)来表达习得的潜空间。我们引入新型度量指标——"样本特定影响分数"、"概念条件影响分数"和"特征对齐缺口"——以量化习得表征的透明性。KREPES无需监督即可直接审计潜空间,例如揭示了Adult-1M数据集中SSL利用人口统计特征代理收入(影响信披)的算法偏差。最后,为确保在百万级样本基准(ImageNet-1K、Adult-1M)上的可扩展性,KREPES为SSL目标函数引入了基于奈斯特罗姆近似的全新解析推理框架。