Taxonomy expansion is the process of incorporating a large number of additional nodes (i.e., "queries") into an existing taxonomy (i.e., "seed"), with the most important step being the selection of appropriate positions for each query. Enormous efforts have been made by exploring the seed's structure. However, existing approaches are deficient in their mining of structural information in two ways: poor modeling of the hierarchical semantics and failure to capture directionality of is-a relation. This paper seeks to address these issues by explicitly denoting each node as the combination of inherited feature (i.e., structural part) and incremental feature (i.e., supplementary part). Specifically, the inherited feature originates from "parent" nodes and is weighted by an inheritance factor. With this node representation, the hierarchy of semantics in taxonomies (i.e., the inheritance and accumulation of features from "parent" to "child") could be embodied. Additionally, based on this representation, the directionality of is-a relation could be easily translated into the irreversible inheritance of features. Inspired by the Darmois-Skitovich Theorem, we implement this irreversibility by a non-Gaussian constraint on the supplementary feature. A log-likelihood learning objective is further utilized to optimize the proposed model (dubbed DNG), whereby the required non-Gaussianity is also theoretically ensured. Extensive experimental results on two real-world datasets verify the superiority of DNG relative to several strong baselines.
翻译:摘要:分类体系扩展是将大量额外节点(即“查询”)融入现有分类体系(即“种子”)的过程,其中最关键的一步是为每个查询选择合适的位置。现有方法通过探索种子的结构已做出巨大努力,但这些方法在结构信息挖掘上存在两方面不足:对层级语义的建模能力薄弱,且未能捕捉"is-a"关系的方向性。本文通过将每个节点显式表示为继承特征(即结构部分)与增量特征(即补充部分)的组合来解决这些问题。具体而言,继承特征源自“父节点”,并受继承因子加权。借助这种节点表示,分类体系中的语义层级结构(即从“父节点”到“子节点”的特征继承与累积)得以体现。此外,基于该表示,"is-a"关系的方向性可以自然地转化为特征的不可逆继承。受Darmois-Skitovich定理启发,我们通过对补充特征施加非高斯约束来实现这种不可逆性。进一步采用对数似然学习目标来优化所提出的模型(称为DNG),并在理论上保证了所需的非高斯性。在两个真实数据集上的大量实验结果表明,DNG相较于多个强基线方法具有优越性。