The ability of an agent to do well in new environments is a critical aspect of intelligence. In machine learning, this ability is known as $\textit{strong}$ or $\textit{out-of-distribution}$ generalization. However, merely considering differences in data distributions is inadequate for fully capturing differences between learning environments. In the present paper, we investigate $\textit{out-of-variable}$ generalization, which pertains to an agent's generalization capabilities concerning environments with variables that were never jointly observed before. This skill closely reflects the process of animate learning: we, too, explore Nature by probing, observing, and measuring $\textit{subsets}$ of variables at any given time. Mathematically, $\textit{out-of-variable}$ generalization requires the efficient re-use of past marginal information, i.e., information over subsets of previously observed variables. We study this problem, focusing on prediction tasks across environments that contain overlapping, yet distinct, sets of causes. We show that after fitting a classifier, the residual distribution in one environment reveals the partial derivative of the true generating function with respect to the unobserved causal parent in that environment. We leverage this information and propose a method that exhibits non-trivial out-of-variable generalization performance when facing an overlapping, yet distinct, set of causal predictors.
翻译:智能体在新环境中展现良好表现的能力是智能的重要特征。在机器学习中,这种能力被称为强泛化或分布外泛化。然而,仅考虑数据分布差异不足以完全刻画学习环境间的区别。本文研究变量外泛化问题,即智能体面对从未联合观测过的变量组合环境时的泛化能力。这种技能密切反映了生物的学习过程:如同我们探索自然时,始终仅对变量子集进行探测、观测与测量。从数学角度看,变量外泛化要求有效复用历史边际信息,即关于先前观测变量子集的信息。我们聚焦于包含重叠但不同原因集的环境预测任务展开研究。研究发现,在拟合分类器后,某个环境中的残差分布能够揭示真实生成函数关于该环境中未观测因果父变量的偏导数。基于此信息,我们提出了一种方法,在面临重叠但不同的因果预测变量集时展现出显著的变量外泛化性能。