Passive learning of active causal strategies in agents and language models

What can be learned about causality and experimentation from passive data? This question is salient given recent successes of passively-trained language models in interactive domains such as tool use. Passive learning is inherently limited. However, we show that purely passive learning can in fact allow an agent to learn generalizable strategies for determining and using causal structures, as long as the agent can intervene at test time. We formally illustrate that learning a strategy of first experimenting, then seeking goals, can allow generalization from passive learning in principle. We then show empirically that agents trained via imitation on expert data can indeed generalize at test time to infer and use causal links which are never present in the training data; these agents can also generalize experimentation strategies to novel variable sets never observed in training. We then show that strategies for causal intervention and exploitation can be generalized from passive data even in a more complex environment with high-dimensional observations, with the support of natural language explanations. Explanations can even allow passive learners to generalize out-of-distribution from perfectly-confounded training data. Finally, we show that language models, trained only on passive next-word prediction, can generalize causal intervention strategies from a few-shot prompt containing examples of experimentation, together with explanations and reasoning. These results highlight the surprising power of passive learning of active causal strategies, and may help to understand the behaviors and capabilities of language models.

翻译：从被动数据中能学到关于因果性和实验的什么？这个问题在近期被动训练的语言模型在工具使用等交互领域取得成功后显得尤为突出。被动学习本质上具有局限性。然而，我们证明纯粹被动学习实际上可以使智能体学习到用于确定和运用因果结构的可泛化策略，只要智能体能在测试时进行干预。我们形式上阐明：学习一种"先实验、后追求目标"的策略，原则上可以从被动学习中实现泛化。随后我们通过实验证明：在专家数据上通过模仿训练得到的智能体，确实能在测试时泛化地推断并使用训练数据中从未出现的因果关联；这些智能体还能将实验策略泛化到训练中从未观测到的新变量集合。我们进一步证明：在高维观测的复杂环境中，借助自然语言解释的支持，因果干预与利用策略可从被动数据中泛化。解释甚至能使被动学习者从完全混淆的训练数据中实现分布外泛化。最后，我们展示仅通过被动下一词预测训练的语言模型，能从包含实验示例的少样本提示（结合解释与推理）中泛化出因果干预策略。这些结果凸显了主动因果策略的被动学习的惊人能力，并可能有助于理解语言模型的行为与能力。