The ability of an agent to perform well in new and unseen environments is a crucial aspect of intelligence. In machine learning, this ability is referred to as strong or out-of-distribution generalization. However, simply considering differences in data distributions is not sufficient to fully capture differences in environments. In the present paper, we assay out-of-variable generalization, which refers to an agent's ability to handle new situations that involve variables never jointly observed before. We expect that such ability is important also for AI-driven scientific discovery: humans, too, explore 'Nature' by probing, observing and measuring subsets of variables at one time. Mathematically, it requires efficient re-use of past marginal knowledge, i.e., knowledge over subsets of variables. We study this problem, focusing on prediction tasks that involve observing overlapping, yet distinct, sets of causal parents. We show that the residual distribution of one environment encodes the partial derivative of the true generating function with respect to the unobserved causal parent. Hence, learning from the residual allows zero-shot prediction even when we never observe the outcome variable in the other environment.
翻译:智能体在新颖且未知环境中表现出色的能力是智能的关键方面。在机器学习中,这种能力被称为强泛化或分布外泛化。然而,仅考虑数据分布的差异并不足以完全捕捉环境间的差异。在本文中,我们探讨了超出变量泛化,这指的是智能体处理涉及从未共同观察过的变量的新情境的能力。我们预期这种能力对人工智能驱动的科学发现也至关重要:人类在探索“自然”时,也是通过一次探查、观察和测量变量的子集来实现的。从数学角度看,这需要高效重用先前的边际知识,即关于变量子集的知识。我们研究了这一问题,重点关注涉及观察重叠但不同的因果父变量集的预测任务。我们展示了,一个环境中的残差分布编码了真实生成函数关于未观察因果父变量的偏导数。因此,从残差中学习可以在另一个环境中从未观测到结果变量时实现零样本预测。