In deep Reinforcement Learning (RL), value functions are typically approximated using deep neural networks and trained via mean squared error regression objectives to fit the true value functions. Recent research has proposed an alternative approach, utilizing the cross-entropy classification objective, which has demonstrated improved performance and scalability of RL algorithms. However, existing study have not extensively benchmarked the effects of this replacement across various domains, as the primary objective was to demonstrate the efficacy of the concept across a broad spectrum of tasks, without delving into in-depth analysis. Our work seeks to empirically investigate the impact of such a replacement in an offline RL setup and analyze the effects of different aspects on performance. Through large-scale experiments conducted across a diverse range of tasks using different algorithms, we aim to gain deeper insights into the implications of this approach. Our results reveal that incorporating this change can lead to superior performance over state-of-the-art solutions for some algorithms in certain tasks, while maintaining comparable performance levels in other tasks, however for other algorithms this modification might lead to the dramatic performance drop. This findings are crucial for further application of classification approach in research and practical tasks.
翻译:在深度强化学习(RL)中,价值函数通常通过深度神经网络进行近似,并采用均方误差回归目标进行训练以拟合真实价值函数。近期研究提出了一种替代方案,即利用交叉熵分类目标,该方法已证明能够提升RL算法的性能与可扩展性。然而,现有研究尚未在不同领域广泛评估这种替代方案的效果,因为其主要目标是在广泛的任务范围内验证该概念的有效性,而未进行深入分析。本研究旨在通过实证方法探究在离线RL设置中采用这种替代方案的影响,并分析不同因素对性能的作用。通过在多样化任务中使用不同算法进行大规模实验,我们期望更深入地理解该方法的潜在影响。实验结果表明,对于某些算法在特定任务中,引入这一变更能够取得优于现有最优方案的性能,同时在其他任务中保持相当的性能水平;然而对于其他算法,此修改可能导致性能急剧下降。这一发现对于分类方法在研究和实际任务中的进一步应用至关重要。