Causal inference is central to scientific discovery, yet choosing appropriate methods remains challenging because of the complexity of both statistical methodology and real-world data. Inspired by the success of artificial intelligence in accelerating scientific discovery, we introduce InferenceEvolve, an evolutionary framework that uses large language models to discover and iteratively refine causal methods. Across widely used benchmarks, InferenceEvolve yields estimators that consistently outperform established baselines: against 58 human submissions in a recent community competition, our best evolved estimator lay on the Pareto frontier across two evaluation metrics. We also developed robust proxy objectives for settings without semi-synthetic outcomes, with competitive results. Analysis of the evolutionary trajectories shows that agents progressively discover sophisticated strategies tailored to unrevealed data-generating mechanisms. These findings suggest that language-model-guided evolution can optimize structured scientific programs such as causal inference, even when outcomes are only partially observed.
翻译:因果推断是科学发现的核心,但由于统计方法与现实世界数据的复杂性,选择合适的因果推断方法仍具有挑战性。受人工智能在加速科学发现方面取得成功的启发,我们提出了推理演化框架,这是一个利用大语言模型发现并迭代改进因果方法的进化框架。在广泛使用的基准测试中,推理演化产生的估计器持续优于现有基线:在近期一场社区竞赛中,针对58份人类提交的方案,我们最佳进化估计器在两个评估指标上均处于帕累托前沿。我们还为缺乏半合成结果的情境开发了鲁棒的代理目标,并取得了竞争性结果。对进化轨迹的分析表明,智能体逐步发现了针对未揭示数据生成机制的高度定制化策略。这些发现表明,语言模型引导的进化能够优化结构化科学程序(如因果推断),即使结果仅被部分观测。