Neural-based word embeddings using solely distributional information have consistently produced useful meaning representations for downstream tasks. However, existing approaches often result in representations that are hard to interpret and control. Natural language definitions, on the other side, possess a recursive, self-explanatory semantic structure that can support novel representation learning paradigms able to preserve explicit conceptual relations and constraints in the vector space. This paper proposes a neuro-symbolic, multi-relational framework to learn word embeddings exclusively from natural language definitions by jointly mapping defined and defining terms along with their corresponding semantic relations. By automatically extracting the relations from definitions corpora and formalising the learning problem via a translational objective, we specialise the framework in hyperbolic space to capture the hierarchical and multi-resolution structure induced by the definitions. An extensive empirical analysis demonstrates that the framework can help impose the desired structural constraints while preserving the mapping required for controllable and interpretable semantic navigation. Moreover, the experiments reveal the superiority of the hyperbolic word embeddings over the euclidean counterparts and demonstrate that the multi-relational framework can obtain competitive results when compared to state-of-the-art neural approaches (including Transformers), with the advantage of being significantly more efficient and intrinsically interpretable.
翻译:基于纯粹分布信息的神经词嵌入在下游任务中已持续展现出有效的语义表示能力。然而,现有方法往往导致难以解释和控制的表示结果。相比之下,自然语言定义具有递归自解释的语义结构,能够支持在向量空间中保留显式概念关系与约束的新型表示学习范式。本文提出一种神经符号多关系框架,通过联合映射被定义术语与定义术语及其对应语义关系,完全基于自然语言定义学习词嵌入。通过从定义语料库中自动提取关系,并利用平移目标形式化学习问题,我们将该框架专门应用于双曲空间,以捕捉定义诱导的层次化与多分辨率结构。广泛的实证分析表明,该框架能够在保持可控且可解释的语义导航所需映射的同时,有效施加期望的结构约束。此外,实验揭示了双曲词嵌入相较于欧几里得词嵌入的优越性,并证明该多关系框架在与最先进神经方法(包括Transformer)对比时能取得竞争性结果,同时具有显著更高的效率与内在可解释性优势。