Federated graph learning (FGL) enables collaborative training on graph data across multiple clients. With the rise of large language models (LLMs), textual attributes in FGL graphs are gaining attention. Text-attributed graph federated learning (TAG-FGL) improves FGL by explicitly leveraging LLMs to process and integrate these textual features. However, current TAG-FGL methods face three main challenges: \textbf{(1) Overhead.} LLMs for processing long texts incur high token and computation costs. To make TAG-FGL practical, we introduce graph condensation (GC) to reduce computation load, but this choice also brings new issues. \textbf{(2) Suboptimal.} To reduce LLM overhead, we introduce GC into TAG-FGL by compressing multi-hop texts/neighborhoods into a condensed core with fixed LLM surrogates. However, this one-shot condensation is often not client-adaptive, leading to suboptimal performance. \textbf{(3) Interpretability.} LLM-based condensation further introduces a black-box bottleneck: summaries lack faithful attribution and clear grounding to specific source spans, making local inspection and auditing difficult. To address the above issues, we propose \textbf{DANCE}, a new TAG-FGL paradigm with GC. To improve \textbf{suboptimal} performance, DANCE performs round-wise, model-in-the-loop condensation refresh using the latest global model. To enhance \textbf{interpretability}, DANCE preserves provenance by storing locally inspectable evidence packs that trace predictions to selected neighbors and source text spans. Across 8 TAG datasets, DANCE improves accuracy by \textbf{2.33\%} at an \textbf{8\%} condensation ratio, with \textbf{33.42\%} fewer tokens than baselines.
翻译:联邦图学习(FGL)支持跨多个客户端的图数据协同训练。随着大语言模型(LLMs)的兴起,FGL图中文本属性的重要性日益凸显。文本属性图联邦学习(TAG-FGL)通过显式利用LLMs处理并整合这些文本特征,进一步提升了FGL的性能。然而,当前的TAG-FGL方法面临三个主要挑战:\textbf{(1)开销问题}。处理长文本的LLMs会带来高昂的令牌与计算成本。为使TAG-FGL具备实用性,我们引入图压缩(GC)以降低计算负载,但这一选择也带来了新的问题。\textbf{(2)次优问题}。为降低LLM开销,我们通过将多跳文本/邻域信息压缩为带有固定LLM代理的压缩核心,将GC引入TAG-FGL。然而,这种一次性压缩通常无法适应客户端特性,导致性能次优。\textbf{(3)可解释性问题}。基于LLM的压缩进一步引入了黑盒瓶颈:生成的摘要缺乏可靠的归因,且与具体源文本片段间的关联不明确,使得本地检查与审计变得困难。为解决上述问题,我们提出\textbf{DANCE},一种结合GC的新型TAG-FGL范式。为改善\textbf{次优}性能,DANCE利用最新的全局模型,执行逐轮、模型在环的压缩更新。为增强\textbf{可解释性},DANCE通过存储可在本地检查的证据包来保留溯源信息,这些证据包能够将预测追溯至选定的邻居节点及源文本片段。在8个TAG数据集上的实验表明,在\textbf{8\%}的压缩率下,DANCE将准确率提升了\textbf{2.33\%},且所用令牌数比基线方法减少\textbf{33.42\%}。