MSC: A Dataset for Macro-Management in StarCraft II

Macro-management is an important problem in StarCraft, which has been studied for a long time. Various datasets together with assorted methods have been proposed in the last few years. But these datasets have some defects for boosting the academic and industrial research: 1) There're neither standard preprocessing, parsing and feature extraction procedures nor predefined training, validation and test set in some datasets. 2) Some datasets are only specified for certain tasks in macro-management. 3) Some datasets are either too small or don't have enough labeled data for modern machine learning algorithms such as deep neural networks. So most previous methods are trained with various features, evaluated on different test sets from the same or different datasets, making it difficult to be compared directly. To boost the research of macro-management in StarCraft, we release a new dataset MSC based on the platform SC2LE. MSC consists of well-designed feature vectors, pre-defined high-level actions and final result of each match. We also split MSC into training, validation and test set for the convenience of evaluation and comparison. Besides the dataset, we propose a baseline model and present initial baseline results for global state evaluation and build order prediction, which are two of the key tasks in macro-management. Various downstream tasks and analyses of the dataset are also described for the sake of research on macro-management in StarCraft II. Homepage: https://github.com/wuhuikai/MSC.

翻译：宏观管理是星际争霸中一个长期研究的重要问题。近年来，各种数据集及相应方法相继被提出。但这些数据集在推动学术与工业研究方面存在缺陷：1）某些数据集缺乏标准的预处理、解析和特征提取流程，也未预设训练集、验证集和测试集。2）部分数据集仅针对宏观管理中的特定任务。3）一些数据集规模过小或标注数据不足，难以支持深度神经网络等现代机器学习算法。因此，过往方法大多基于不同特征训练，并在相同或不同数据集的差异化测试集上评估，导致结果难以直接比较。为促进星际争霸宏观管理研究，我们基于SC2LE平台发布了新数据集MSC。MSC包含精心设计的特征向量、预定义的高层动作及每场比赛的最终结果。同时，我们将MSC划分为训练集、验证集和测试集，便于评估与对比。除数据集外，我们提出了一个基线模型，并针对宏观管理的两项关键任务——全局状态评估与建造顺序预测——给出了初始基线结果。此外，本文还描述了各类下游任务及数据集分析，以推动星际争霸II宏观管理的研究。主页：https://github.com/wuhuikai/MSC。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

专知会员服务

105+阅读 · 2022年2月10日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

【ICML2020】用于图结构化数据的卷积核网络，Convolutional Kernel Networks for Graph-Structured Data

专知会员服务

44+阅读 · 2020年6月29日

【知识图谱嵌入补全综述论文】embedding models for knowledge base completion

专知会员服务

103+阅读 · 2020年4月25日