Meta-Learning Strategies through Value Maximization in Neural Networks

Biological and artificial learning agents face numerous choices about how to learn, ranging from hyperparameter selection to aspects of task distributions like curricula. Understanding how to make these meta-learning choices could offer normative accounts of cognitive control functions in biological learners and improve engineered systems. Yet optimal strategies remain challenging to compute in modern deep networks due to the complexity of optimizing through the entire learning process. Here we theoretically investigate optimal strategies in a tractable setting. We present a learning effort framework capable of efficiently optimizing control signals on a fully normative objective: discounted cumulative performance throughout learning. We obtain computational tractability by using average dynamical equations for gradient descent, available for simple neural network architectures. Our framework accommodates a range of meta-learning and automatic curriculum learning methods in a unified normative setting. We apply this framework to investigate the effect of approximations in common meta-learning algorithms; infer aspects of optimal curricula; and compute optimal neuronal resource allocation in a continual learning setting. Across settings, we find that control effort is most beneficial when applied to easier aspects of a task early in learning; followed by sustained effort on harder aspects. Overall, the learning effort framework provides a tractable theoretical test bed to study normative benefits of interventions in a variety of learning systems, as well as a formal account of optimal cognitive control strategies over learning trajectories posited by established theories in cognitive neuroscience.

翻译：生物与人工学习智能体在学习过程中面临众多抉择，包括超参数选择以及课程学习等任务分布设计。理解这些元学习决策的优化机制，既能对生物学习者的认知控制功能提供规范性解释，也能改进工程系统的性能。然而，由于需要贯穿整个学习过程进行复杂优化，在当代深度网络中计算最优策略仍具挑战性。本文在可解释框架下理论探究最优策略，提出一种能够基于完全规范目标（即学习过程中的折现累积性能）高效优化控制信号的学习努力框架。通过采用梯度下降的平均动力学方程（适用于简单神经网络架构），我们实现了计算可处理性。该框架将元学习与自动课程学习方法统一纳入规范性框架，并应用于以下研究：评估常见元学习算法中近似手段的影响；推断最优课程的关键特征；计算持续学习场景中的最优神经元资源分配。跨场景研究发现：早期对任务的简单方面施加控制努力收益最高，随后应将努力持续集中于困难方面。该学习努力框架为研究各类学习系统中干预措施的规范性优势提供了可处理的理论试验平台，同时为认知神经科学经典理论中关于学习轨迹的最优认知控制策略提供了形式化解释。