Hypergraphs are marked by complex topology, expressing higher-order interactions among multiple entities with hyperedges. Lately, hypergraph-based deep learning methods to learn informative data representations for the problem of node classification on text-attributed hypergraphs have garnered increasing research attention. However, existing methods struggle to simultaneously capture the full extent of hypergraph structural information and the rich linguistic attributes inherent in the nodes attributes, which largely hampers their effectiveness and generalizability. To overcome these challenges, we explore ways to further augment a pretrained BERT model with specialized hypergraph-aware layers for the task of node classification. Such layers introduce higher-order structural inductive bias into the language model, thus improving the model's capacity to harness both higher-order context information from the hypergraph structure and semantic information present in text. In this paper, we propose a new architecture, HyperBERT, a mixed text-hypergraph model which simultaneously models hypergraph relational structure while maintaining the high-quality text encoding capabilities of a pre-trained BERT. Notably, HyperBERT presents results that achieve a new state-of-the-art on 5 challenging text-attributed hypergraph node classification benchmarks.
翻译:摘要:超图以复杂拓扑结构为特征,通过超边表达多实体间的高阶交互关系。近年来,基于超图的深度学习方法在文本属性超图节点分类任务中学习信息性数据表示的研究日益受到关注。然而现有方法难以同时捕获超图结构的全部信息与节点属性中包含的丰富语言特征,这严重制约了其有效性与泛化能力。为应对这些挑战,我们探索在预训练的BERT模型中嵌入专用超图感知层以增强节点分类性能。该类层为语言模型引入高阶结构归纳偏置,从而提升模型同时利用超图结构中的高阶上下文信息与文本语义信息的能力。本文提出新型混合文本-超图架构HyperBERT,该模型在保持预训练BERT高质量文本编码能力的同时,同步建模超图关系结构。值得注意的是,HyperBERT在5个具有挑战性的文本属性超图节点分类基准测试中取得了新的最优结果。