Modern self-supervised representation learning methods often relies on empirical heuristics that are not theoretically grounded. In this study we propose HyDeS, a theoretically grounded method based on multi-view mutual information maximization within an hyperspherical space using Shannon differential entropy with a non-parametric von Mises-Fisher density estimator. We show that HyDeS bias the trained model towards focusing on foreground features of the images and perform well on segmentation tasks such as VOC PASCAL, while it lags in fine-grained classification. We provide a detailed analysis of the induced latent space geometry and learning dynamics, that can be used for designing other theoretically grounded self-supervised learning methods.
翻译:现代自监督表示学习方法常依赖缺乏理论基础的实证启发式策略。本研究提出HyDeS方法——一种具有理论基础的创新方法,其核心是通过香农微分熵与非参数化的冯·米塞斯-费舍尔密度估计器,在超球面空间内实现多视角互信息最大化。实验表明,HyDeS方法能引导模型聚焦于图像前景特征,在VOC PASCAL等分割任务中表现优异,但在细粒度分类任务中表现稍逊。我们详细分析了所诱导的隐空间几何特征与学习动力学机制,这为设计其他理论驱动的自监督学习方法提供了重要参考。