Forward-backward (FB) representations provide a powerful framework for learning the successor representation (SR) in continuous spaces by enforcing a low-rank factorization. However, a fundamental spectral mismatch often exists between the high-rank transition dynamics of continuous environments and the low-rank bottleneck of the FB architecture, making accurate low-rank representation learning difficult. In this work, we analyze temporal abstraction as a mechanism to mitigate this mismatch. By characterizing the spectral properties of the transition operator, we show that temporal abstraction acts as a low-pass filter that suppresses high-frequency spectral components. This suppression reduces the effective rank of the induced SR while preserving a formal bound on the resulting value function error. Empirically, we show that this alignment is a key factor for stable FB learning, particularly at high discount factors where bootstrapping becomes error-prone. Our results identify temporal abstraction as a principled mechanism for shaping the spectral structure of the underlying MDP and enabling effective long-horizon representations in continuous control.
翻译:前向-后向(FB)表示为在连续空间中通过强制执行低秩分解来学习后继表示(SR)提供了强大框架。然而,连续环境的高阶转移动力学与FB架构的低秩瓶颈之间通常存在根本性的频谱失配,这使得准确的低秩表示学习变得困难。在这项工作中,我们分析了时间抽象作为缓解这种失配的机制。通过刻画转移算子的频谱特性,我们证明时间抽象充当了一种低通滤波器,抑制了高频频谱分量。这种抑制降低了诱导SR的有效秩,同时保持了结果值函数误差的形式化界限。实证上,我们证明这种对齐是稳定FB学习的关键因素,特别是在引导变得容易出错的高折扣因子情况下。我们的结果将时间抽象识别为塑造底层MDP频谱结构并实现连续控制中有效长程表示的一种原则性机制。