Deep metric learning has been demonstrated to be highly effective in learning semantic representation and encoding information that can be used to measure data similarity, by relying on the embedding learned from metric learning. At the same time, variational autoencoder (VAE) has widely been used to approximate inference and proved to have a good performance for directed probabilistic models. However, for traditional VAE, the data label or feature information are intractable. Similarly, traditional representation learning approaches fail to represent many salient aspects of the data. In this project, we propose a novel integrated framework to learn latent embedding in VAE by incorporating deep metric learning. The features are learned by optimizing a triplet loss on the mean vectors of VAE in conjunction with standard evidence lower bound (ELBO) of VAE. This approach, which we call Triplet based Variational Autoencoder (TVAE), allows us to capture more fine-grained information in the latent embedding. Our model is tested on MNIST data set and achieves a high triplet accuracy of 95.60% while the traditional VAE (Kingma & Welling, 2013) achieves triplet accuracy of 75.08%.
翻译:深度度量学习通过依赖从度量学习中学习的嵌入,已被证明在学习语义表示和编码可用于衡量数据相似性的信息方面非常有效。同时,变分自编码器(VAE)被广泛用于近似推理,并已被证明在定向概率模型中具有良好的性能。然而,对于传统VAE,数据标签或特征信息是难以处理的。类似地,传统的表示学习方法未能表达数据的许多显著特征。在本项目中,我们提出了一种新颖的集成框架,通过结合深度度量学习来学习VAE中的潜在嵌入。特征通过在三元组损失上优化VAE的均值向量并结合VAE的标准证据下界(ELBO)来学习。这种方法,我们称之为基于三元组的变分自编码器(TVAE),使我们能够捕获潜在嵌入中更细粒度的信息。我们的模型在MNIST数据集上进行了测试,实现了95.60%的高三元组准确率,而传统VAE(Kingma & Welling, 2013)的三元组准确率为75.08%。