The manifold hypothesis posits that high-dimensional data often lies on a lower-dimensional manifold and that utilizing this manifold as the target space yields more efficient representations. While numerous traditional manifold-based techniques exist for dimensionality reduction, their application in self-supervised learning has witnessed slow progress. The recent MSimCLR method combines manifold encoding with SimCLR but requires extremely low target encoding dimensions to outperform SimCLR, limiting its applicability. This paper introduces a novel learning paradigm using an unbalanced atlas (UA), capable of surpassing state-of-the-art self-supervised learning approaches. We investigated and engineered the DeepInfomax with an unbalanced atlas (DIM-UA) method by adapting the Spatiotemporal DeepInfomax (ST-DIM) framework to align with our proposed UA paradigm. The efficacy of DIM-UA is demonstrated through training and evaluation on the Atari Annotated RAM Interface (AtariARI) benchmark, a modified version of the Atari 2600 framework that produces annotated image samples for representation learning. The UA paradigm improves existing algorithms significantly as the number of target encoding dimensions grows. For instance, the mean F1 score averaged over categories of DIM-UA is ~75% compared to ~70% of ST-DIM when using 16384 hidden units.
翻译:流形假说认为,高维数据通常位于低维流形上,且利用该流形作为目标空间能产生更高效的表示。尽管存在大量传统的基于流形的降维技术,但它们在自监督学习中的应用进展缓慢。近期提出的MSimCLR方法将流形编码与SimCLR相结合,但需要极低的目标编码维度才能超越SimCLR,这限制了其适用性。本文提出一种使用非平衡图册(UA)的新型学习范式,能够超越最先进的自监督学习方法。我们通过调整时空深度信息最大化(ST-DIM)框架以适配所提出的UA范式,研究并构建了基于非平衡图册的深度信息最大化(DIM-UA)方法。DIM-UA的有效性通过在Atari注释RAM接口(AtariARI)基准测试上的训练和评估得到验证,该基准是Atari 2600框架的修改版本,可生成用于表示学习的注释图像样本。随着目标编码维度的增加,UA范式显著改进了现有算法。例如,当使用16384个隐藏单元时,DIM-UA在各类别上的平均F1分数约为75%,而ST-DIM约为70%。