Hierarchical knowledge structures are ubiquitous across real-world domains and play a vital role in organizing information from coarse to fine semantic levels. While such structures have been widely used in taxonomy systems, biomedical ontologies, and retrieval-augmented generation, their potential remains underexplored in the context of Text-Rich Networks (TRNs), where each node contains rich textual content and edges encode semantic relationships. Existing methods for learning on TRNs often focus on flat semantic modeling, overlooking the inherent hierarchical semantics embedded in textual documents. To this end, we propose TIER (Hierarchical \textbf{T}axonomy-\textbf{I}nformed R\textbf{E}presentation Learning on Text-\textbf{R}ich Networks), which first constructs an implicit hierarchical taxonomy and then integrates it into the learned node representations. Specifically, TIER employs similarity-guided contrastive learning to build a clustering-friendly embedding space, upon which it performs hierarchical K-Means followed by LLM-powered clustering refinement to enable semantically coherent taxonomy construction. Leveraging the resulting taxonomy, TIER introduces a cophenetic correlation coefficient-based regularization loss to align the learned embeddings with the hierarchical structure. By learning representations that respect both fine-grained and coarse-grained semantics, TIER enables more interpretable and structured modeling of real-world TRNs. We demonstrate that our approach significantly outperforms existing methods on multiple datasets across diverse domains, highlighting the importance of hierarchical knowledge learning for TRNs.
翻译:层级知识结构在现实世界领域中普遍存在,在从粗粒度到细粒度的语义层面组织信息方面发挥着至关重要的作用。尽管此类结构已广泛应用于分类系统、生物医学本体和检索增强生成中,但其在文本丰富网络中的潜力仍未得到充分探索。在文本丰富网络中,每个节点包含丰富的文本内容,边则编码语义关系。现有的文本丰富网络学习方法通常侧重于平面语义建模,忽略了文本文档中固有的层级语义。为此,我们提出了TIER(基于层级分类知识的文本丰富网络表示学习方法),该方法首先构建一个隐式的层级分类体系,然后将其整合到学习到的节点表示中。具体而言,TIER采用相似性引导的对比学习来构建一个利于聚类的嵌入空间,在此基础上执行层级K-Means聚类,并辅以LLM驱动的聚类精化,以实现语义连贯的分类体系构建。利用生成的分类体系,TIER引入了一种基于同表相关系数的正则化损失,以使学习到的嵌入与层级结构对齐。通过学习同时尊重细粒度和粗粒度语义的表示,TIER能够对现实世界的文本丰富网络进行更具可解释性和结构化的建模。我们在多个不同领域的数据集上证明,该方法显著优于现有方法,凸显了层级知识学习对于文本丰富网络的重要性。