Behavior trees represent a modular way to create an overall controller from a set of sub-controllers solving different sub-problems. These sub-controllers can be created in different ways, such as classical model based control or reinforcement learning (RL). If each sub-controller satisfies the preconditions of the next sub-controller, the overall controller will achieve the overall goal. However, even if all sub-controllers are locally optimal in achieving the preconditions of the next, with respect to some performance metric such as completion time, the overall controller might be far from optimal with respect to the same performance metric. In this paper we show how the performance of the overall controller can be improved if we use approximations of value functions to inform the design of a sub-controller of the needs of the next one. We also show how, under certain assumptions, this leads to a globally optimal controller when the process is executed on all sub-controllers. Finally, this result also holds when some of the sub-controllers are already given, i.e., if we are constrained to use some existing sub-controllers the overall controller will be globally optimal given this constraint.
翻译:行为树提供了一种模块化方法,通过组合一系列解决不同子问题的子控制器来构建整体控制器。这些子控制器可通过不同方式构建,例如经典的基于模型控制或强化学习。若每个子控制器均满足下一子控制器的前提条件,则整体控制器将实现总体目标。然而,即使所有子控制器在实现下一子控制器前提条件方面(以完成时间等性能指标衡量)均达到局部最优,整体控制器在同一性能指标下仍可能远非最优。本文展示了如何通过使用值函数近似来指导子控制器设计,使其感知后续子控制器的需求,从而提升整体控制器的性能。我们进一步证明,在特定假设下,当该过程应用于所有子控制器时,可推导出全局最优控制器。最后,该结论同样适用于部分子控制器已给定的情况——即当必须使用某些现有子控制器时,在此约束下整体控制器仍能达到全局最优。