NetHack is known as the frontier of reinforcement learning research where learning-based methods still need to catch up to rule-based solutions. One of the promising directions for a breakthrough is using pre-collected datasets similar to recent developments in robotics, recommender systems, and more under the umbrella of offline reinforcement learning (ORL). Recently, a large-scale NetHack dataset was released; while it was a necessary step forward, it has yet to gain wide adoption in the ORL community. In this work, we argue that there are three major obstacles for adoption: resource-wise, implementation-wise, and benchmark-wise. To address them, we develop an open-source library that provides workflow fundamentals familiar to the ORL community: pre-defined D4RL-style tasks, uncluttered baseline implementations, and reliable evaluation tools with accompanying configs and logs synced to the cloud.
翻译:NetHack被视为强化学习研究的前沿领域,在该领域中基于学习的方法仍需追赶基于规则的解决方案。实现突破的可行方向之一是使用预收集数据集,这与近期机器人技术、推荐系统等领域在离线强化学习框架下的发展方向一致。尽管近期已发布大规模NetHack数据集,但这一必要进展尚未被离线强化学习社区广泛采用。本研究提出阻碍其采用的三大障碍:资源层面、实现层面和基准测试层面。为解决这些问题,我们开发了一个开源库,该库提供离线强化学习社区熟悉的工作流基础要素:预定义的D4RL风格任务、简洁的基线实现方案,以及配备云端同步配置与日志的可靠评估工具。