Passive learning of active causal strategies in agents and language models

What can be learned about causality and experimentation from passive data? This question is salient given recent successes of passively-trained language models in interactive domains such as tool use. Passive learning is inherently limited. However, we show that purely passive learning can in fact allow an agent to learn generalizable strategies for determining and using causal structures, as long as the agent can intervene at test time. We formally illustrate that learning a strategy of first experimenting, then seeking goals, can allow generalization from passive learning in principle. We then show empirically that agents trained via imitation on expert data can indeed generalize at test time to infer and use causal links which are never present in the training data; these agents can also generalize experimentation strategies to novel variable sets never observed in training. We then show that strategies for causal intervention and exploitation can be generalized from passive data even in a more complex environment with high-dimensional observations, with the support of natural language explanations. Explanations can even allow passive learners to generalize out-of-distribution from perfectly-confounded training data. Finally, we show that language models, trained only on passive next-word prediction, can generalize causal intervention strategies from a few-shot prompt containing examples of experimentation, together with explanations and reasoning. These results highlight the surprising power of passive learning of active causal strategies, and may help to understand the behaviors and capabilities of language models.

翻译：从被动数据中能学到关于因果性和实验的哪些知识？这一问题在当前被动训练的语言模型（如工具使用等交互场景中取得成功的案例）的背景下尤为突出。被动学习本身存在固有局限性。然而，我们证明：只要智能体在测试阶段能够进行干预，纯粹的被动力学实际上可以让智能体习得可泛化的策略，用于确定和利用因果结构。我们首先从理论上阐明，学习“先实验、再追求目标”的策略，原则上能够实现从被动学习中的泛化。随后通过实验证明，通过模仿专家数据进行训练的智能体确实可以在测试阶段泛化，推断并利用训练数据中从未出现的因果联系；这些智能体还能将实验策略泛化至训练中从未见过的新变量集合。进一步地，我们在更复杂的、具有高维观测数据的环境中发现，借助自然语言解释的支持，因果干预与利用策略可以从被动数据中实现泛化。解释甚至能让被动学习者从完美混淆的训练数据中实现分布外泛化。最后，我们证明仅通过被动下一词预测训练的语言模型，能够从包含实验示例的少量样本提示中，结合解释与推理，泛化因果干预策略。这些结果凸显了被动学习主动因果策略的惊人能力，并可能有助于理解语言模型的行为与能力。