In this paper, we delve into the problem of simplicial representation learning utilizing the 1-Wasserstein distance on a tree structure (a.k.a., Tree-Wasserstein distance (TWD)), where TWD is defined as the L1 distance between two tree-embedded vectors. Specifically, we consider a framework for simplicial representation estimation employing a self-supervised learning approach based on SimCLR with a negative TWD as a similarity measure. In SimCLR, the cosine similarity with real-vector embeddings is often utilized; however, it has not been well studied utilizing L1-based measures with simplicial embeddings. A key challenge is that training the L1 distance is numerically challenging and often yields unsatisfactory outcomes, and there are numerous choices for probability models. Thus, this study empirically investigates a strategy for optimizing self-supervised learning with TWD and find a stable training procedure. More specifically, we evaluate the combination of two types of TWD (total variation and ClusterTree) and several simplicial models including the softmax function, the ArcFace probability model, and simplicial embedding. Moreover, we propose a simple yet effective Jeffrey divergence-based regularization method to stabilize the optimization. Through empirical experiments on STL10, CIFAR10, CIFAR100, and SVHN, we first found that the simple combination of softmax function and TWD can obtain significantly lower results than the standard SimCLR (non-simplicial model and cosine similarity). We found that the model performance depends on the combination of TWD and the simplicial model, and the Jeffrey divergence regularization usually helps model training. Finally, we inferred that the appropriate choice of combination of TWD and simplicial models outperformed cosine similarity based representation learning.
翻译:本文探讨了利用树结构上的1-Wasserstein距离(即树-Wasserstein距离,TWD)进行单形体表示学习的问题,其中TWD定义为两个树嵌入向量之间的L1距离。具体而言,我们考虑采用基于SimCLR的自监督学习方法框架,以负TWD作为相似性度量进行单形体表示估计。在SimCLR中,通常使用基于实值嵌入的余弦相似度,但基于L1度量的单形体嵌入方法尚未得到充分研究。关键挑战在于L1距离的训练在数值上较为困难且常导致不理想的结果,同时概率模型存在多种选择。因此,本研究通过实证探索了利用TWD优化自监督学习的策略,并找到了稳定的训练流程。更具体地,我们评估了两种TWD类型(全变差和ClusterTree)与多种单形体模型(包括softmax函数、ArcFace概率模型以及单形体嵌入)的组合效果。此外,我们提出了一种简单有效的基于Jeffrey散度的正则化方法以稳定优化过程。通过在STL10、CIFAR10、CIFAR100和SVHN数据集上的实证实验,我们首先发现softmax函数与TWD的简单组合所得结果显著低于标准SimCLR(非单形体模型和余弦相似度)。研究发现模型性能取决于TWD与单形体模型的组合方式,而Jeffrey散度正则化通常有助于模型训练。最终推断得出,合适的TWD与单形体模型组合其性能优于基于余弦相似度的表示学习方法。