Transforming Graphs for Enhanced Attribute-Based Clustering: An Innovative Graph Transformer Method

Graph Representation Learning (GRL) is an influential methodology, enabling a more profound understanding of graph-structured data and aiding graph clustering, a critical task across various domains. The recent incursion of attention mechanisms, originally an artifact of Natural Language Processing (NLP), into the realm of graph learning has spearheaded a notable shift in research trends. Consequently, Graph Attention Networks (GATs) and Graph Attention Auto-Encoders have emerged as preferred tools for graph clustering tasks. Yet, these methods primarily employ a local attention mechanism, thereby curbing their capacity to apprehend the intricate global dependencies between nodes within graphs. Addressing these impediments, this study introduces an innovative method known as the Graph Transformer Auto-Encoder for Graph Clustering (GTAGC). By melding the Graph Auto-Encoder with the Graph Transformer, GTAGC is adept at capturing global dependencies between nodes. This integration amplifies the graph representation and surmounts the constraints posed by the local attention mechanism. The architecture of GTAGC encompasses graph embedding, integration of the Graph Transformer within the autoencoder structure, and a clustering component. It strategically alternates between graph embedding and clustering, thereby tailoring the Graph Transformer for clustering tasks, whilst preserving the graph's global structural information. Through extensive experimentation on diverse benchmark datasets, GTAGC has exhibited superior performance against existing state-of-the-art graph clustering methodologies. This pioneering approach represents a novel contribution to the field of graph clustering, paving the way for promising avenues in future research.

翻译：图表示学习（GRL）是一种具有影响力的方法论，它能更深入地理解图结构数据，并有助于图聚类——这一在多个领域中的关键任务。注意力机制（最初为自然语言处理（NLP）的产物）近期渗透至图学习领域，推动了研究趋势的显著转变。因此，图注意力网络（GAT）和图注意力自编码器已成为图聚类任务的首选工具。然而，这些方法主要采用局部注意力机制，从而限制了其捕捉图中节点间复杂全局依赖关系的能力。针对这些局限，本研究提出一种创新方法——用于图聚类的图变换器自编码器（GTAGC）。通过将图自编码器与图变换器相融合，GTAGC能够有效捕捉节点间的全局依赖关系。这种集成增强了图表示，并突破了局部注意力机制的约束。GTAGC的架构包括图嵌入、自编码器结构中图变换器的集成以及聚类组件。它策略性地交替进行图嵌入与聚类，从而针对聚类任务优化图变换器，同时保留图的全局结构信息。通过在多个基准数据集上的广泛实验，GTAGC相较于现有最先进的图聚类方法展现出优越性能。这一开创性方法为图聚类领域做出了新颖贡献，为未来研究开辟了有前景的方向。