Proficient Graph Neural Network Design by Accumulating Knowledge on Large Language Models

from arxiv, Accepted at WSDM 2026. Title changed from "Computation-friendly graph neural network design by accumulating knowledge on large language models" to "Proficient Graph Neural Network Design by Accumulating Knowledge on Large Language Models"

High-level automation is increasingly critical in AI, driven by rapid advances in large language models (LLMs) and AI agents. However, LLMs, despite their general reasoning power, struggle significantly in specialized, data-sensitive tasks such as designing Graph Neural Networks (GNNs). This difficulty arises from (1) the inherent knowledge gaps in modeling the intricate, varying relationships between graph properties and suitable architectures and (2) the external noise from misleading descriptive inputs, often resulting in generic or even misleading model suggestions. Achieving proficiency in designing data-aware models -- defined as the meta-level capability to systematically accumulate, interpret, and apply data-specific design knowledge -- remains challenging for existing automated approaches, due to their inefficient construction and application of meta-knowledge. To achieve meta-level proficiency, we propose DesiGNN, a knowledge-centered framework that systematically converts past model design experience into structured, fine-grained knowledge priors well-suited for meta-learning with LLMs. To account for the inherent variability and external noise, DesiGNN aligns empirical property filtering from extensive benchmarks with adaptive elicitation of literature insights via LLMs. By constructing a solid meta-knowledge between unseen graph understanding and known effective architecture patterns, DesiGNN can deliver top-5.77% initial model proposals for unseen datasets within seconds and achieve consistently superior performance with minimal search cost compared to baselines.

翻译：高级自动化在人工智能领域日益重要，这主要得益于大型语言模型（LLMs）和AI智能体的快速发展。然而，尽管LLMs具备强大的通用推理能力，但在设计图神经网络（GNNs）这类专业且对数据敏感的任务中却面临显著困难。这一困难源于两方面：（1）在建模图属性与合适架构之间错综复杂且多变的关系时，LLMs存在固有的知识鸿沟；（2）来自误导性描述输入的外部噪声，常常导致其提出通用甚至具有误导性的模型建议。实现数据感知模型的熟练设计——即系统性地积累、解释和应用数据特定设计知识的元层次能力——对于现有的自动化方法而言仍然具有挑战性，这主要归因于其元知识构建与应用的低效性。为实现元层次的熟练设计，我们提出了DesiGNN，这是一个以知识为中心的框架，能够系统地将过往的模型设计经验转化为结构化的、细粒度的知识先验，这些先验非常适合与LLMs进行元学习。为了应对固有的变异性和外部噪声，DesiGNN将通过广泛基准测试获得的经验性属性筛选，与通过LLMs对文献见解进行的自适应启发式提取相结合。通过在未见过的图数据理解与已知的有效架构模式之间构建坚实的元知识，DesiGNN能够在数秒内为未见数据集提供排名前5.77%的初始模型方案，并且与基线方法相比，能以极低的搜索成本持续获得卓越的性能。