Embedding learning transforms discrete data entities into continuous numerical representations, encoding features/properties of the entities. Despite the outstanding performance reported from different embedding learning algorithms, few efforts were devoted to structurally interpreting how features are encoded in the learned embedding space. This work proposes EmbeddingTree, a hierarchical embedding exploration algorithm that relates the semantics of entity features with the less-interpretable embedding vectors. An interactive visualization tool is also developed based on EmbeddingTree to explore high-dimensional embeddings. The tool helps users discover nuance features of data entities, perform feature denoising/injecting in embedding training, and generate embeddings for unseen entities. We demonstrate the efficacy of EmbeddingTree and our visualization tool through embeddings generated for industry-scale merchant data and the public 30Music listening/playlists dataset.
翻译:嵌入学习将离散的数据实体转换为连续的数值表示,编码实体的特征/属性。尽管不同的嵌入学习算法已展现出卓越性能,但少有工作致力于从结构上解释特征如何在学习到的嵌入空间中被编码。本文提出EmbeddingTree——一种将实体特征的语义与难以解释的嵌入向量关联起来的分层嵌入探索算法。基于EmbeddingTree,我们还开发了一个交互式可视化工具以探索高维嵌入。该工具有助于用户发现数据实体的细微特征、在嵌入训练中进行特征去噪/注入,并为未见实体生成嵌入。我们通过针对工业级商户数据和公开的30Music收听/播放列表数据集生成的嵌入,证明了EmbeddingTree及其可视化工具的有效性。