LLMs have recently shown strong potential in enhancing node-level tasks on text-attributed graphs (TAGs) by providing explanation features. However, their practical use is severely limited by the high computational and monetary cost of repeated LLM queries. To illustrate, naively generating explanations for all nodes on a medium-sized benchmark like Photo (48k nodes) using a representative method (e.g., TAPE) would consume days of processing time. In this paper, we propose Bilevel-Optimized Sparse Querying (BOSQ), a general framework that selectively leverages LLM-derived explanation features to enhance performance on node-level tasks on TAGs. We design an adaptive sparse querying strategy that selectively decides when to invoke LLMs, avoiding redundant or low-gain queries and significantly reducing computation overhead. Extensive experiments on six real-world TAG datasets involving two types of node-level tasks demonstrate that BOSQ achieves orders of magnitude speedups over existing GraphLLM methods while consistently delivering on-par or superior performance.
翻译:大型语言模型(LLM)近期通过提供解释性特征,在文本属性图(TAG)的节点级任务中展现出强大的增强潜力。然而,重复调用LLM查询所带来的高昂计算成本与经济开销严重限制了其实际应用。举例而言,若采用代表性方法(如TAPE)为中等规模基准数据集(如包含4.8万个节点的Photo数据集)中所有节点生成解释,将耗费数日处理时间。本文提出双层优化稀疏查询(BOSQ)——一个通用框架,通过选择性利用LLM衍生的解释特征来提升TAG节点级任务性能。我们设计了一种自适应稀疏查询策略,可智能决策何时调用LLM,从而避免冗余或低增益查询,显著降低计算开销。在涉及两类节点级任务的六个真实世界TAG数据集上的大量实验表明,BOSQ在保持相当或更优性能的同时,相比现有GraphLLM方法实现了数量级的速度提升。