Despite their remarkable success in language modeling, transformers trained to predict the next token in a sequence struggle with long-term planning. This limitation is particularly evident in tasks requiring foresight to plan multiple steps ahead such as maze navigation. The standard next single token prediction objective, however, offers no explicit mechanism to predict multiple steps ahead - or revisit the path taken so far. Consequently, in this work we study whether explicitly predicting multiple steps ahead (and backwards) can improve transformers' maze navigation. We train parameter-matched transformers from scratch, under identical settings, to navigate mazes of varying types and sizes with standard next token prediction and MLM-U, an objective explicitly predicting multiple steps ahead and backwards. We find that MLM-U considerably improves transformers' ability to navigate mazes compared to standard next token prediction across maze types and complexities. We also find MLM-U training is 4x more sample efficient and converges 2x faster in terms of GPU training hours relative to next token training. Finally, for more complex mazes we find MLM-U benefits from scaling to larger transformers. Remarkably, we find transformers trained with MLM-U outperform larger transformers trained with next token prediction using additional supervision from A* search traces. We hope these findings underscore the promise of learning objectives to advance transformers' capacity for long-term planning.
翻译:尽管Transformer模型在语言建模方面取得了显著成功,但经过序列下一词元预测训练的Transformer在长期规划方面仍存在困难。这一局限在需要前瞻性多步规划的任务(如迷宫导航)中尤为明显。然而,标准的单步下一词元预测目标并未提供明确的多步前向预测机制,亦无法回溯已探索路径。因此,本研究旨在探究显式进行多步前向(及后向)预测是否能提升Transformer的迷宫导航能力。我们在完全相同的设置下,从头训练参数规模匹配的Transformer模型,使其在不同类型和尺寸的迷宫中分别采用标准下一词元预测与MLM-U(一种显式进行多步前向与后向预测的目标函数)进行导航。实验发现,相较于标准下一词元预测,MLM-U显著提升了Transformer在各种类型和复杂度迷宫中的导航能力。同时,MLM-U训练具有4倍的样本效率优势,且GPU训练时长达收敛所需时间仅为下一词元训练的1/2。此外,对于更复杂的迷宫,MLM-U能够受益于更大规模Transformer的扩展。值得注意的是,采用MLM-U训练的Transformer甚至优于那些通过A*搜索轨迹获得额外监督、且规模更大的下一词元预测训练模型。我们希望这些发现能够凸显学习目标设计在提升Transformer长期规划能力方面的潜力。