In classical AI, perception relies on learning state-based representations, while planning, which can be thought of as temporal reasoning over action sequences, is typically achieved through search. We study whether such reasoning can instead emerge from representations that capture both perceptual and temporal structure. We show that standard temporal contrastive learning, despite its popularity, often fails to capture temporal structure due to its reliance on spurious features. To address this, we introduce Combinatorial Representations for Temporal Reasoning (CRTR), a method that uses a negative sampling scheme to provably remove these spurious features and facilitate temporal reasoning. CRTR achieves strong results on domains with complex temporal structure, such as Sokoban and Rubik's Cube. In particular, for the Rubik's Cube, CRTR learns representations that generalize across all initial states and allow it to solve the puzzle using fewer search steps than BestFS, though with longer solutions. To our knowledge, this is the first method that efficiently solves arbitrary Cube states using only learned representations, without relying on an external search algorithm.
翻译:在经典人工智能中,感知依赖于学习基于状态的表征,而规划(可视为对动作序列的时序推理)通常通过搜索实现。本研究探讨此类推理是否能够从同时捕捉感知与时序结构的表征中自然涌现。我们发现,尽管标准时序对比学习广受欢迎,但由于其依赖伪特征,往往无法有效捕捉时序结构。为解决此问题,我们提出时序推理的组合表征方法(CRTR),该方法通过负采样方案可证明地消除这些伪特征并促进时序推理。CRTR在具有复杂时序结构的领域(如推箱子游戏和魔方)取得了显著成果。特别对于魔方问题,CRTR学习的表征能够泛化至所有初始状态,使其在比BestFS更少搜索步数的情况下解谜(尽管解路径更长)。据我们所知,这是首个仅依赖学习表征(无需外部搜索算法)即可高效求解任意魔方状态的方法。