Representation Learning on Knowledge Graphs (KGs) is essential for downstream tasks. The dominant approach, KG Embedding (KGE), represents entities with independent vectors and faces the scalability challenge. Recent studies propose an alternative way for parameter efficiency, which represents entities by composing entity-corresponding codewords matched from predefined small-scale codebooks. We refer to the process of obtaining corresponding codewords of each entity as entity quantization, for which previous works have designed complicated strategies. Surprisingly, this paper shows that simple random entity quantization can achieve similar results to current strategies. We analyze this phenomenon and reveal that entity codes, the quantization outcomes for expressing entities, have higher entropy at the code level and Jaccard distance at the codeword level under random entity quantization. Therefore, different entities become more easily distinguished, facilitating effective KG representation. The above results show that current quantization strategies are not critical for KG representation, and there is still room for improvement in entity distinguishability beyond current strategies. The code to reproduce our results is available at https://github.com/JiaangL/RandomQuantization.
翻译:知识图谱上的表示学习对下游任务至关重要。主流方法——知识图谱嵌入(KGE)——利用独立向量表示实体,面临可扩展性挑战。近期研究提出了一种参数高效的替代方案,通过从预定义的小规模码本中匹配实体对应的码字来组合表示实体。我们将获取每个实体对应码字的过程称为实体量化,以往工作为此设计了复杂策略。令人惊讶的是,本文表明简单的随机实体量化即可达到与现有策略相似的效果。我们分析这一现象后发现:在随机实体量化下,用于表达实体的量化结果——实体编码——在编码层面具有更高熵值,在码字层面具有更大Jaccard距离。因此,不同实体更易区分,从而促进有效的知识图谱表示。上述结果表明,当前量化策略对知识图谱表示并非关键,且实体可区分性在现有策略之外仍有提升空间。复现结果的代码见https://github.com/JiaangL/RandomQuantization。