In recent years, self-supervised learning has played a pivotal role in advancing machine learning by allowing models to acquire meaningful representations from unlabeled data. An intriguing research avenue involves developing self-supervised models within an information-theoretic framework, but many studies often deviate from the stochasticity assumptions made when deriving their objectives. To gain deeper insights into this issue, we propose to explicitly model the representation with stochastic embeddings and assess their effects on performance, information compression and potential for out-of-distribution detection. From an information-theoretic perspective, we seek to investigate the impact of probabilistic modeling on the information bottleneck, shedding light on a trade-off between compression and preservation of information in both representation and loss space. Emphasizing the importance of distinguishing between these two spaces, we demonstrate how constraining one can affect the other, potentially leading to performance degradation. Moreover, our findings suggest that introducing an additional bottleneck in the loss space can significantly enhance the ability to detect out-of-distribution examples, only leveraging either representation features or the variance of their underlying distribution.
翻译:近年来,自监督学习通过使模型能够从未标记数据中获取有意义的表征,在推动机器学习发展方面发挥了关键作用。一个引人入胜的研究方向是在信息论框架内开发自监督模型,但许多研究往往偏离了其在推导目标函数时所采用的随机性假设。为了更深入地理解这一问题,我们提出用随机嵌入显式建模表征,并评估其对性能、信息压缩及分布外检测潜力的影响。从信息论角度出发,我们旨在探究概率建模对信息瓶颈的影响,揭示在表征空间和损失空间中信息压缩与保留之间的权衡关系。通过强调区分这两个空间的重要性,我们展示了对一个空间的约束如何影响另一个空间,并可能导致性能下降。此外,我们的研究结果表明,在损失空间中引入额外的瓶颈可以显著增强检测分布外样本的能力,且仅需利用表征特征或其潜在分布的方差。