Goal-conditioned policies are generally understood to be "feed-forward" circuits, in the form of neural networks that map from the current state and the goal specification to the next action to take. However, under what circumstances such a policy can be learned and how efficient the policy will be are not well understood. In this paper, we present a circuit complexity analysis for relational neural networks (such as graph neural networks and transformers) representing policies for planning problems, by drawing connections with serialized goal regression search (S-GRS). We show that there are three general classes of planning problems, in terms of the growth of circuit width and depth as a function of the number of objects and planning horizon, providing constructive proofs. We also illustrate the utility of this analysis for designing neural networks for policy learning.
翻译:目标条件策略通常被理解为“前馈”电路,即从当前状态和目标规范映射到下一步行动的神经网络形式。然而,此类策略在何种情况下可被学习以及其效率如何,目前尚未得到充分理解。本文通过建立与序列化目标回归搜索(S-GRS)的联系,对表示规划问题策略的关系型神经网络(如图神经网络和Transformer)进行了电路复杂性分析。我们证明了存在三类通用规划问题,其电路宽度和深度随对象数量和规划视界增长的特性可通过构造性证明加以刻画。此外,我们还阐明了该分析对策略学习神经网络设计的实用价值。