The ability to learn and compose functions is foundational to efficient learning and reasoning in humans, enabling flexible generalizations such as creating new dishes from known cooking processes. Beyond sequential chaining of functions, existing linguistics literature indicates that humans can grasp more complex compositions with interacting functions, where output production depends on context changes induced by different function orderings. Extending the investigation into the visual domain, we developed a function learning paradigm to explore the capacity of humans and neural network models in learning and reasoning with compositional functions under varied interaction conditions. Following brief training on individual functions, human participants were assessed on composing two learned functions, in ways covering four main interaction types, including instances in which the application of the first function creates or removes the context for applying the second function. Our findings indicate that humans can make zero-shot generalizations on novel visual function compositions across interaction conditions, demonstrating sensitivity to contextual changes. A comparison with a neural network model on the same task reveals that, through the meta-learning for compositionality (MLC) approach, a standard sequence-to-sequence Transformer can mimic human generalization patterns in composing functions.
翻译:学习并组合函数的能力是人类高效学习与推理的基础,使得能够实现灵活泛化,例如根据已知烹饪流程创造新菜肴。除顺序串联函数外,现有语言学文献表明,人类能够掌握更复杂的具有交互函数的组合关系,其中输出生成依赖于不同函数顺序引发的上下文变化。为将这一研究拓展至视觉领域,我们开发了一种函数学习范式,探索人类与神经网络模型在多种交互条件下学习与推理组合函数的能力。在针对单个函数的简短训练后,研究参与者需完成两项已学函数的组合任务,涵盖四种主要交互类型,包括第一函数的应用为第二函数的应用创建或移除上下文的情况。研究结果表明,人类能够在不同交互条件下对新型视觉函数组合实现零样本泛化,展现出对上下文变化的敏感性。与基于相同任务的神经网络模型对比显示,通过元学习组合性(MLC)方法,标准序列到序列Transformer能够模仿人类在函数组合中的泛化模式。