Knowledge Graph (KG) generation requires models to learn complex semantic dependencies between triples while maintaining domain validity constraints. Unlike link prediction, which scores triples independently, generative models must capture interdependencies across entire subgraphs to produce semantically coherent structures. We present ARK (Auto-Regressive Knowledge Graph Generation), a family of autoregressive models that generate KGs by treating graphs as sequences of (head, relation, tail) triples. ARK learns implicit semantic constraints directly from data, including type consistency, temporal validity, and relational patterns, without explicit rule supervision. On the IntelliGraphs benchmark, our models achieve 89.2% to 100.0% semantic validity across diverse datasets while generating novel graphs not seen during training. We also introduce SAIL, a variational extension of ARK that enables controlled generation through learned latent representations, supporting both unconditional sampling and conditional completion from partial graphs. Our analysis reveals that model capacity (hidden dimensionality >= 64) is more critical than architectural depth for KG generation, with recurrent architectures achieving comparable validity to transformer-based alternatives while offering substantial computational efficiency. These results demonstrate that autoregressive models provide an effective framework for KG generation, with practical applications in knowledge base completion and query answering.
翻译:知识图谱(KG)生成要求模型在保持领域有效性约束的同时,学习三元组间复杂的语义依赖关系。与独立评分三元组的链接预测不同,生成模型必须捕获整个子图的相互依赖关系,以生成语义连贯的结构。我们提出ARK(自回归知识图谱生成)系列模型,通过将图视为(头实体,关系,尾实体)三元组序列来生成知识图谱。ARK直接从数据中学习隐式语义约束,包括类型一致性、时间有效性和关系模式,无需显式的规则监督。在IntelliGraphs基准测试中,我们的模型在多样化数据集上实现了89.2%至100.0%的语义有效性,同时能生成训练中未见过的新图谱。我们还引入了SAIL,即ARK的变分扩展,通过学习到的潜在表示实现可控生成,支持无条件采样和基于部分图的条件补全。我们的分析表明,对于知识图谱生成,模型容量(隐藏维度≥64)比架构深度更为关键,循环架构在实现与基于Transformer的替代方案相当的有效性同时,提供了显著的计算效率。这些结果表明,自回归模型为知识图谱生成提供了一个有效的框架,在知识库补全和查询回答中具有实际应用价值。