平衡努力与性能的最优学习率调度方法 (Optimal Learning Rate Schedule for Balancing Effort and Performance)

Learning how to learn efficiently is a fundamental challenge for biological agents and a growing concern for artificial ones. To learn effectively, an agent must regulate its learning speed, balancing the benefits of rapid improvement against the costs of effort, instability, or resource use. We introduce a normative framework that formalizes this problem as an optimal control process in which the agent maximizes cumulative performance while incurring a cost of learning. From this objective, we derive a closed-form solution for the optimal learning rate, which has the form of a closed-loop controller that depends only on the agent's current and expected future performance. Under mild assumptions, this solution generalizes across tasks and architectures and reproduces numerically optimized schedules in simulations. In simple learning models, we can mathematically analyze how agent and task parameters shape learning-rate scheduling as an open-loop control solution. Because the optimal policy depends on expectations of future performance, the framework predicts how overconfidence or underconfidence influence engagement and persistence, linking the control of learning speed to theories of self-regulated learning. We further show how a simple episodic memory mechanism can approximate the required performance expectations by recalling similar past learning experiences, providing a biologically plausible route to near-optimal behaviour. Together, these results provide a normative and biologically plausible account of learning speed control, linking self-regulated learning, effort allocation, and episodic memory estimation within a unified and tractable mathematical framework.

翻译：学习如何高效学习是生物智能体面临的基本挑战，也是人工智能体日益关注的问题。为实现有效学习，智能体必须调节其学习速度，在快速改进的收益与努力、不稳定性或资源消耗的成本之间取得平衡。我们提出一个规范框架，将该问题形式化为最优控制过程：智能体在最大化累积性能的同时需承担学习成本。基于此目标，我们推导出最优学习率的闭式解，其形式为仅依赖于智能体当前及预期未来性能的闭环控制器。在温和假设下，该解能泛化至不同任务与架构，并在仿真中复现数值优化调度方案。在简单学习模型中，我们可以通过数学分析揭示智能体与任务参数如何将学习率调度塑造为开环控制解。由于最优策略依赖于对未来性能的预期，该框架能够预测过度自信或自信不足如何影响学习投入与持续性，从而将学习速度控制与自我调节学习理论相连接。我们进一步证明，简单的情景记忆机制可通过回忆过往类似学习经验来近似所需的性能预期，为接近最优行为提供了生物学可行的实现路径。这些结果共同构建了学习速度控制的规范性且生物学合理的解释框架，将自我调节学习、努力分配和情景记忆估计统一于可处理的数学体系之中。