Dungeons and Data: A Large-Scale NetHack Dataset

from arxiv, 9 pages, published in the Proceedings of the 36th Conference on Neural Information Processing Systems (NeurIPS 2022) Track on Datasets and Benchmarks. New links to hosting location. Revised results, same conclusions

Recent breakthroughs in the development of agents to solve challenging sequential decision making problems such as Go, StarCraft, or DOTA, have relied on both simulated environments and large-scale datasets. However, progress on this research has been hindered by the scarcity of open-sourced datasets and the prohibitive computational cost to work with them. Here we present the NetHack Learning Dataset (NLD), a large and highly-scalable dataset of trajectories from the popular game of NetHack, which is both extremely challenging for current methods and very fast to run. NLD consists of three parts: 10 billion state transitions from 1.5 million human trajectories collected on the NAO public NetHack server from 2009 to 2020; 3 billion state-action-score transitions from 100,000 trajectories collected from the symbolic bot winner of the NetHack Challenge 2021; and, accompanying code for users to record, load and stream any collection of such trajectories in a highly compressed form. We evaluate a wide range of existing algorithms including online and offline RL, as well as learning from demonstrations, showing that significant research advances are needed to fully leverage large-scale datasets for challenging sequential decision making tasks.

翻译：近期在解决诸如围棋、星际争霸或DOTA等复杂序贯决策问题的智能体开发中取得了突破性进展，这既依赖于模拟环境也依赖于大规模数据集。然而，由于开源数据集的稀缺性以及处理这些数据所需的巨大计算成本，相关研究进展一直受到阻碍。本文提出NetHack学习数据集（NLD），这是一个来自热门游戏NetHack的大规模、高可扩展性轨迹数据集，该游戏对现有方法极具挑战性且运行效率极高。NLD包含三部分：从2009年至2020年在NAO公共NetHack服务器收集的150万条人类玩家轨迹中提取的100亿个状态转移数据；从NetHack挑战赛2021符号化机器人获胜者收集的10万条轨迹中提取的30亿个状态-动作-得分转移数据；以及配套代码，允许用户以高压缩形式记录、加载和流式传输任意轨迹集合。我们评估了包括在线与离线强化学习以及基于示范学习在内的多种现有算法，结果表明要充分挖掘大规模数据集在复杂序贯决策任务中的潜力，仍需开展大量研究创新。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日