Optimal control of the future is the next frontier for AI. Current approaches to this problem are typically rooted in either reinforcement learning or online learning. While powerful, these frameworks for learning are mathematically distinct from Probably Approximately Correct (PAC) learning, which has been the workhorse for the recent technological achievements in AI. We therefore build on the prior work of prospective learning, an extension of PAC learning (without control) in non-stationary environments (De Silva et al., 2023; Silva et al., 2024; Bai et al., 2026). Here, we further extend the PAC learning framework to address learning and control in non-stationary environments. Using this framework, called ''Prospective Control'', we prove that under certain fairly general assumptions, empirical risk minimization (ERM) asymptotically achieves the Bayes optimal policy. We then consider a specific instance of prospective control, foraging, which is a canonical task for any mobile agent, be it natural or artificial. We illustrate that existing reinforcement learning algorithms fail to learn in these non-stationary environments, and even with modifications, they are orders of magnitude less efficient than our prospective foraging agents. Code is available at: https://github.com/neurodata/ProspectiveLearningwithControl.
翻译:未来的最优控制是人工智能的下一个前沿领域。当前解决该问题的方法通常基于强化学习或在线学习。尽管这些学习框架功能强大,但它们在数学上与“可能近似正确”(PAC)学习截然不同,而PAC学习正是推动近期人工智能技术成就的核心工具。因此,我们基于先前的前瞻性学习研究(De Silva等人,2023;Silva等人,2024;Bai等人,2026)展开工作,该研究是PAC学习(无控制)在非平稳环境中的扩展。在此,我们进一步扩展PAC学习框架,以解决非平稳环境中的学习与控制问题。利用这一名为“前瞻控制”的框架,我们证明在若干相当一般的假设下,经验风险最小化(ERM)能够渐近地实现贝叶斯最优策略。随后,我们考察了前瞻控制的一个具体实例——觅食,这是任何移动智能体(无论是自然还是人工的)的典型任务。我们阐明,现有的强化学习算法在这些非平稳环境中无法有效学习,即使经过修改,其效率也远低于我们的前瞻性觅食智能体(相差数个数量级)。代码可在以下网址获取:https://github.com/neurodata/ProspectiveLearningwithControl。