Combining offline and online reinforcement learning (RL) techniques is indeed crucial for achieving efficient and safe learning where data acquisition is expensive. Existing methods replay offline data directly in the online phase, resulting in a significant challenge of data distribution shift and subsequently causing inefficiency in online fine-tuning. To address this issue, we introduce an innovative approach, \textbf{E}nergy-guided \textbf{DI}ffusion \textbf{S}ampling (EDIS), which utilizes a diffusion model to extract prior knowledge from the offline dataset and employs energy functions to distill this knowledge for enhanced data generation in the online phase. The theoretical analysis demonstrates that EDIS exhibits reduced suboptimality compared to solely utilizing online data or directly reusing offline data. EDIS is a plug-in approach and can be combined with existing methods in offline-to-online RL setting. By implementing EDIS to off-the-shelf methods Cal-QL and IQL, we observe a notable 20% average improvement in empirical performance on MuJoCo, AntMaze, and Adroit environments. Code is available at \url{https://github.com/liuxhym/EDIS}.
翻译:结合离线与在线强化学习技术对于在数据获取成本高昂的情况下实现高效且安全的学习至关重要。现有方法在在线阶段直接回放离线数据,导致数据分布偏移这一重大挑战,进而造成在线微调的低效性。为解决此问题,我们提出了一种创新方法——能量引导扩散采样(EDIS),该方法利用扩散模型从离线数据集中提取先验知识,并采用能量函数来提炼该知识,以增强在线阶段的数据生成。理论分析表明,与仅使用在线数据或直接复用离线数据相比,EDIS展现出更低的次优性。EDIS是一种即插即用方法,可与离线到在线强化学习场景中的现有方法相结合。通过将EDIS应用于现成方法Cal-QL和IQL,我们在MuJoCo、AntMaze和Adroit环境中观察到平均20%的显著性能提升。代码发布于\url{https://github.com/liuxhym/EDIS}。