While deep reinforcement learning (RL) has been demonstrated effective in solving complex control tasks, sample efficiency remains a key challenge due to the large amounts of data required for remarkable performance. Existing research explores the application of representation learning for data-efficient RL, e.g., learning predictive representations by predicting long-term future states. However, many existing methods do not fully exploit the structural information inherent in sequential state signals, which can potentially improve the quality of long-term decision-making but is difficult to discern in the time domain. To tackle this problem, we propose State Sequences Prediction via Fourier Transform (SPF), a novel method that exploits the frequency domain of state sequences to extract the underlying patterns in time series data for learning expressive representations efficiently. Specifically, we theoretically analyze the existence of structural information in state sequences, which is closely related to policy performance and signal regularity, and then propose to predict the Fourier transform of infinite-step future state sequences to extract such information. One of the appealing features of SPF is that it is simple to implement while not requiring storage of infinite-step future states as prediction targets. Experiments demonstrate that the proposed method outperforms several state-of-the-art algorithms in terms of both sample efficiency and performance.
翻译:虽然深度强化学习(RL)已被证明在解决复杂控制任务方面有效,但由于实现卓越性能需要大量数据,样本效率仍然是关键挑战。现有研究探索了将表示学习应用于数据高效的RL,例如通过预测长期未来状态来学习预测性表示。然而,许多现有方法并未充分利用顺序状态信号中固有的结构信息,这些信息可能有助于提升长期决策质量,但在时域中难以辨别。为解决这一问题,我们提出基于傅里叶变换的状态序列预测(SPF),这是一种利用状态序列频域提取时间序列数据中潜在模式以高效学习表达性表示的新方法。具体而言,我们从理论上分析了状态序列中结构信息的存在性,该信息与策略性能和信号正则性密切相关,进而提出通过预测无限步未来状态序列的傅里叶变换来提取此类信息。SPF的显著优点之一在于实现简单,且无需将无限步未来状态作为预测目标存储。实验表明,所提方法在样本效率和性能方面均优于多种现有最优算法。