Behavior trees represent a modular way to create an overall controller from a set of sub-controllers solving different sub-problems. These sub-controllers can be created in different ways, such as classical model based control or reinforcement learning (RL). If each sub-controller satisfies the preconditions of the next sub-controller, the overall controller will achieve the overall goal. However, even if all sub-controllers are locally optimal in achieving the preconditions of the next, with respect to some performance metric such as completion time, the overall controller might be far from optimal with respect to the same performance metric. In this paper we show how the performance of the overall controller can be improved if we use approximations of value functions to inform the design of a sub-controller of the needs of the next one. We also show how, under certain assumptions, this leads to a globally optimal controller when the process is executed on all sub-controllers. Finally, this result also holds when some of the sub-controllers are already given, i.e., if we are constrained to use some existing sub-controllers the overall controller will be globally optimal given this constraint.
翻译:行为树提供了一种模块化方法,通过组合解决不同子问题的子控制器来构建整体控制器。这些子控制器可通过不同方式构建,例如经典模型控制或强化学习。若每个子控制器满足下一子控制器的前置条件,则整体控制器能够实现总体目标。然而,即使所有子控制器在实现下一子控制器前置条件方面(针对完成时间等性能指标)均为局部最优,整体控制器在该性能指标上仍可能远非最优。本文展示了如何通过使用值函数近似来告知子控制器设计下一子控制器的需求,从而提升整体控制器的性能。我们同时证明,在特定假设下,当该过程应用于所有子控制器时,将产生全局最优控制器。最后,即使部分子控制器已被给定(即受限于使用现有子控制器),该结论依然成立——在此约束下,整体控制器将获得全局最优解。