Natural language definitions possess a recursive, self-explanatory semantic structure that can support representation learning methods able to preserve explicit conceptual relations and constraints in the latent space. This paper presents a multi-relational model that explicitly leverages such a structure to derive word embeddings from definitions. By automatically extracting the relations linking defined and defining terms from dictionaries, we demonstrate how the problem of learning word embeddings can be formalised via a translational framework in Hyperbolic space and used as a proxy to capture the global semantic structure of definitions. An extensive empirical analysis demonstrates that the framework can help imposing the desired structural constraints while preserving the semantic mapping required for controllable and interpretable traversal. Moreover, the experiments reveal the superiority of the Hyperbolic word embeddings over the Euclidean counterparts and demonstrate that the multi-relational approach can obtain competitive results when compared to state-of-the-art neural models, with the advantage of being intrinsically more efficient and interpretable.
翻译:自然语言定义具有递归和自解释的语义结构,这种结构能够支持保留潜在空间中显式概念关系与约束的表征学习方法。本文提出一种多关系模型,该模型通过显式利用此类结构从定义中推导词嵌入。通过自动提取词典中定义词与被定义词之间的关联关系,我们论证了如何在双曲空间中借助翻译框架将词嵌入学习问题形式化,并将其作为捕获定义全局语义结构的代理方法。大量实证分析表明,该框架既能施加所需的结构约束,又能保留可控可解释语义遍历所需的映射关系。此外,实验揭示了双曲词嵌入相较欧几里得词嵌入的优越性,并证明了多关系方法在与最先进神经模型对比时能获得具有竞争力的结果,同时具备内在的更高效率与可解释性优势。