We tackle the challenge of rapidly adapting an agent's behavior to solve spatiotemporally continuous problems in novel settings. Animals exhibit extraordinary abilities to adapt to new contexts, a capacity unmatched by artificial systems. Instead of focusing on generalization through deep reinforcement learning, we propose viewing behavior as the physical manifestation of a search procedure, where robust problem-solving emerges from an exhaustive search across all possible behaviors. Surprisingly, this can be done efficiently using online modification of a cognitive graph that guides action, challenging the predominant view that exhaustive search in continuous spaces is impractical. We describe an algorithm that implicitly enumerates behaviors by regulating the tight feedback loop between execution of behaviors and mutation of the graph, and provide a neural implementation based on Hebbian learning and a novel high-dimensional harmonic representation inspired by entorhinal cortex. By framing behavior as search, we provide a mathematically simple and biologically plausible model for real-time behavioral adaptation, successfully solving a variety of continuous state-space navigation problems. This framework not only offers a flexible neural substrate for other applications but also presents a powerful paradigm for understanding adaptive behavior. Our results suggest potential advancements in developmental learning and unsupervised skill acquisition, paving the way for autonomous robots to master complex skills in data-sparse environments demanding flexibility.
翻译:我们致力于解决智能体快速适应行为以解决新颖环境中时空连续问题的挑战。动物展现出适应新情境的非凡能力,这种能力目前人工系统尚无法企及。不同于通过深度强化学习实现泛化的传统思路,我们提出将行为视为搜索过程的物理表征,其中鲁棒的问题解决能力源于对所有可能行为的穷举搜索。令人惊讶的是,通过在线修改指导动作的认知图谱,这一过程可以高效实现,这对"连续空间穷举搜索不可行"的主流观点提出了挑战。我们描述了一种通过调节行为执行与图谱变异之间的紧密反馈回路来隐式枚举行为的算法,并提供了基于赫布学习与新型高维调和表征的神经实现方案(该表征灵感来源于内嗅皮层)。通过将行为框架构建为搜索过程,我们为实时行为适应提供了数学简洁且生物学合理的模型,成功解决了多种连续状态空间导航问题。该框架不仅为其他应用提供了灵活的神经基础,也为理解适应性行为提供了强大的范式。我们的研究结果表明,该方法在发育式学习与无监督技能获取方面具有推进潜力,为自主机器人在需要高度灵活性的数据稀疏环境中掌握复杂技能开辟了新路径。