Transformers have shown inconsistent success in AI planning tasks, and theoretical understanding of when generalization should be expected has been limited. We take important steps towards addressing this gap by analyzing the ability of decoder-only models to verify whether a given plan correctly solves a given planning instance. To analyse the general setting where the number of objects -- and thus the effective input alphabet -- grows at test time, we introduce C*-RASP, an extension of C-RASP designed to establish length generalization guarantees for transformers under the simultaneous growth in sequence length and vocabulary size. Our results identify a large class of classical planning domains for which transformers can provably learn to verify long plans, and structural properties that significantly affects the learnability of length generalizable solutions. Empirical experiments corroborate our theory.
翻译:Transformer在人工智能规划任务中表现出不一致的成功率,而对其泛化能力的理论理解仍然有限。我们通过分析仅解码器模型验证给定规划实例中规划方案正确性的能力,在弥合这一空白上迈出了重要步伐。为分析测试时对象数量(即有效输入字母表)增长的一般场景,我们提出了C*-RASP——C-RASP的扩展,旨在建立序列长度与词汇量同步增长时Transformer的长度泛化保证。研究结果识别出一大类经典规划领域,其中Transformer可证明学习验证长规划方案,并揭示了显著影响可学习长度泛化解的结构特性。实证实验佐证了我们的理论。