In-context learning is a powerful capability of certain machine learning models that arguably underpins the success of today's frontier AI models. However, in-context learning is critically limited to settings where the in-context distribution of interest $p_{\theta}^{ICL}( x|\mathcal{D})$ can be straightforwardly expressed and/or parameterized by the model; for instance, language modeling relies on expressing the next-token distribution as a categorical distribution parameterized by the network's output logits. In this work, we present a more general form of in-context learning without such a limitation that we call \textit{in-context learning of energy functions}. The idea is to instead learn the unconstrained and arbitrary in-context energy function $E_{\theta}^{ICL}(x|\mathcal{D})$ corresponding to the in-context distribution $p_{\theta}^{ICL}(x|\mathcal{D})$. To do this, we use classic ideas from energy-based modeling. We provide preliminary evidence that our method empirically works on synthetic data. Interestingly, our work contributes (to the best of our knowledge) the first example of in-context learning where the input space and output space differ from one another, suggesting that in-context learning is a more-general capability than previously realized.
翻译:上下文学习是某些机器学习模型的一项强大能力,可以说是当今前沿人工智能模型成功的基础。然而,上下文学习严格局限于目标上下文分布 $p_{\theta}^{ICL}( x|\mathcal{D})$ 能够被模型直接表达和/或参数化的场景;例如,语言建模依赖于将下一个词元的分布表达为由网络输出逻辑值参数化的分类分布。在本工作中,我们提出了一种更通用的上下文学习形式,它不受此限制,我们称之为\textit{能量函数的上下文学习}。其核心思想是转而学习与上下文分布 $p_{\theta}^{ICL}(x|\mathcal{D})$ 对应的、无约束且任意的上下文能量函数 $E_{\theta}^{ICL}(x|\mathcal{D})$。为此,我们借鉴了基于能量的建模中的经典思想。我们提供了初步证据,表明我们的方法在合成数据上经验有效。有趣的是,据我们所知,我们的工作首次展示了输入空间与输出空间不同的上下文学习实例,这表明上下文学习是一种比先前认知更为通用的能力。