Causal discovery from observational data remains fundamentally limited by identifiability constraints. Recent work has explored leveraging Large Language Models (LLMs) as sources of prior causal knowledge, but existing approaches rely on heuristic integration that lacks theoretical grounding. We introduce HOLOGRAPH, a framework that formalizes LLM-guided causal discovery through sheaf theory--representing local causal beliefs as sections of a presheaf over variable subsets. Our key insight is that coherent global causal structure corresponds to the existence of a global section, while topological obstructions manifest as non-vanishing sheaf cohomology. We propose the Algebraic Latent Projection to handle hidden confounders and Natural Gradient Descent on the belief manifold for principled optimization. Experiments on synthetic and real-world benchmarks demonstrate that HOLOGRAPH provides rigorous mathematical foundations while achieving competitive performance on causal discovery tasks with 50-100 variables. Our sheaf-theoretic analysis reveals that while Identity, Transitivity, and Gluing axioms are satisfied to numerical precision (<10^{-6}), the Locality axiom fails for larger graphs, suggesting fundamental non-local coupling in latent variable projections. Code is available at [https://github.com/hyunjun1121/holograph](https://github.com/hyunjun1121/holograph).
翻译:从观测数据中发现因果关系仍受限于可识别性约束。近期研究探索利用大语言模型作为先验因果知识的来源,但现有方法依赖缺乏理论基础的启发式整合。我们提出HOLOGRAPH框架,通过层理论将LLM引导的因果发现形式化——将局部因果信念表示为变量子集上预层的截面。我们的核心洞见是:一致的全局因果结构对应全局截面的存在,而拓扑障碍则表现为非零的层上同调。我们提出代数隐变量投影来处理隐藏混杂因子,并在信念流形上使用自然梯度下降进行原则性优化。在合成和真实基准数据集上的实验表明,HOLOGRAPH为因果发现任务提供了严格的数学基础,同时在50-100个变量的因果发现任务中取得了具有竞争力的性能。我们的层理论分析表明:虽然恒等性、传递性和粘合公理在数值精度上得到满足(<10^{-6}),但局部性公理在较大图中失效,这揭示了隐变量投影中本质的非局部耦合特性。代码发布于[https://github.com/hyunjun1121/holograph](https://github.com/hyunjun1121/holograph)。