We present temporally layered architecture (TLA), a biologically inspired system for temporally adaptive distributed control. TLA layers a fast and a slow controller together to achieve temporal abstraction that allows each layer to focus on a different time-scale. Our design is biologically inspired and draws on the architecture of the human brain which executes actions at different timescales depending on the environment's demands. Such distributed control design is widespread across biological systems because it increases survivability and accuracy in certain and uncertain environments. We demonstrate that TLA can provide many advantages over existing approaches, including persistent exploration, adaptive control, explainable temporal behavior, compute efficiency and distributed control. We present two different algorithms for training TLA: (a) Closed-loop control, where the fast controller is trained over a pre-trained slow controller, allowing better exploration for the fast controller and closed-loop control where the fast controller decides whether to "act-or-not" at each timestep; and (b) Partially open loop control, where the slow controller is trained over a pre-trained fast controller, allowing for open loop-control where the slow controller picks a temporally extended action or defers the next n-actions to the fast controller. We evaluated our method on a suite of continuous control tasks and demonstrate the advantages of TLA over several strong baselines.
翻译:我们提出了时间分层架构(Temporally Layered Architecture, TLA),这是一种受生物启发的自适应分布式控制时序系统。TLA通过将快速控制器与慢速控制器分层结合,实现了时间抽象,使各层能聚焦于不同时间尺度的任务。该设计受人类大脑架构启发——人脑会根据环境需求在不同时间尺度上执行动作,这种分布式控制模式在生物系统中广泛存在,因其能提升确定与不确定环境下的生存能力与控制精度。研究表明,TLA相比现有方法具有持续探索、自适应控制、可解释的时序行为、计算高效及分布式控制等多重优势。我们提出了两种训练TLA的算法:(a) 闭环控制——在预训练的慢速控制器上训练快速控制器,使后者能进行更优探索,并在每个时间步自主决策是否执行动作;(b) 部分开环控制——在预训练的快速控制器上训练慢速控制器,使慢速控制器能选择时间扩展动作或将后续n个动作委托给快速控制器,实现开环控制。我们在连续控制任务集上进行了评估,证明TLA相较于多个强基线方法具有显著优势。