Macro-management is an important problem in StarCraft, which has been studied for a long time. Various datasets together with assorted methods have been proposed in the last few years. But these datasets have some defects for boosting the academic and industrial research: 1) There're neither standard preprocessing, parsing and feature extraction procedures nor predefined training, validation and test set in some datasets. 2) Some datasets are only specified for certain tasks in macro-management. 3) Some datasets are either too small or don't have enough labeled data for modern machine learning algorithms such as deep neural networks. So most previous methods are trained with various features, evaluated on different test sets from the same or different datasets, making it difficult to be compared directly. To boost the research of macro-management in StarCraft, we release a new dataset MSC based on the platform SC2LE. MSC consists of well-designed feature vectors, pre-defined high-level actions and final result of each match. We also split MSC into training, validation and test set for the convenience of evaluation and comparison. Besides the dataset, we propose a baseline model and present initial baseline results for global state evaluation and build order prediction, which are two of the key tasks in macro-management. Various downstream tasks and analyses of the dataset are also described for the sake of research on macro-management in StarCraft II. Homepage: https://github.com/wuhuikai/MSC.
翻译:宏观管理是星际争霸中一个长期研究的重要问题。近年来,各种数据集及相应方法相继被提出。但这些数据集在推动学术与工业研究方面存在缺陷:1)某些数据集缺乏标准的预处理、解析和特征提取流程,也未预设训练集、验证集和测试集。2)部分数据集仅针对宏观管理中的特定任务。3)一些数据集规模过小或标注数据不足,难以支持深度神经网络等现代机器学习算法。因此,过往方法大多基于不同特征训练,并在相同或不同数据集的差异化测试集上评估,导致结果难以直接比较。为促进星际争霸宏观管理研究,我们基于SC2LE平台发布了新数据集MSC。MSC包含精心设计的特征向量、预定义的高层动作及每场比赛的最终结果。同时,我们将MSC划分为训练集、验证集和测试集,便于评估与对比。除数据集外,我们提出了一个基线模型,并针对宏观管理的两项关键任务——全局状态评估与建造顺序预测——给出了初始基线结果。此外,本文还描述了各类下游任务及数据集分析,以推动星际争霸II宏观管理的研究。主页:https://github.com/wuhuikai/MSC。