On-the-fly reasoning often requires adaptation to novel problems under limited data and distribution shift. This work introduces CausalARC: an experimental testbed for AI reasoning in low-data and out-of-distribution regimes, modeled after the Abstraction and Reasoning Corpus (ARC). Each CausalARC reasoning task is sampled from a fully specified causal world model, formally expressed as a structural causal model. Principled data augmentations provide observational, interventional, and counterfactual feedback about the world model in the form of few-shot, in-context learning demonstrations. As a proof-of-concept, we illustrate the use of CausalARC for four language model evaluation settings: (1) abstract reasoning with test-time training, (2) counterfactual reasoning with in-context learning, (3) program synthesis, and (4) causal discovery with logical reasoning. Within- and between-model performance varied heavily across tasks, indicating room for significant improvement in language model reasoning.
翻译:实时推理通常需要在有限数据和分布偏移的情况下适应新问题。本文提出CausalARC:一个面向低数据与分布外场景的人工智能推理实验基准,其设计借鉴了抽象与推理语料库(ARC)。每个CausalARC推理任务均采样自完全指定的因果世界模型,该模型以结构化因果模型的形式形式化表达。基于原则的数据增强方法通过少样本上下文学习演示,提供关于世界模型的观测性、干预性及反事实反馈。作为概念验证,我们展示了CausalARC在四种语言模型评估场景中的应用:(1)结合测试时训练的抽象推理;(2)结合上下文学习的反事实推理;(3)程序合成;(4)结合逻辑推理的因果发现。模型内与模型间的性能在不同任务中差异显著,表明语言模型的推理能力仍有较大提升空间。