Empirical data can often be considered as samples from a set of probability distributions. Kernel methods have emerged as a natural approach for learning to classify these distributions. Although numerous kernels between distributions have been proposed, applying kernel methods to distribution regression tasks remains challenging, primarily because selecting a suitable kernel is not straightforward. Surprisingly, the question of learning a data-dependent distribution kernel has received little attention. In this paper, we propose a novel objective for the unsupervised learning of data-dependent distribution kernel, based on the principle of entropy maximization in the space of probability measure embeddings. We examine the theoretical properties of the latent embedding space induced by our objective, demonstrating that its geometric structure is well-suited for solving downstream discriminative tasks. Finally, we demonstrate the performance of the learned kernel across different modalities.
翻译:经验数据通常可视为来自一组概率分布的样本。核方法已成为学习对这些分布进行分类的自然途径。尽管已有大量分布间核函数被提出,但将核方法应用于分布回归任务仍具挑战性,主要原因是选择合适的核函数并非易事。值得注意的是,学习数据依赖型分布核的问题尚未得到充分关注。本文基于概率测度嵌入空间中熵最大化的原理,提出了一种用于无监督学习数据依赖型分布核的新目标函数。我们研究了该目标函数所诱导的潜在嵌入空间的理论性质,证明其几何结构非常适合解决下游判别任务。最后,我们在不同模态上验证了所学核函数的性能。