The evaluation of procedural content generation (PCG) systems for generating video game levels is a complex and contested topic. Ideally, the field would have access to robust, generalisable and widely accepted evaluation approaches that can be used to compare novel PCG systems to prior work, but consensus on how to evaluate novel systems is currently limited. We argue that the field can benefit from a structured analysis of how procedural level generation systems can be evaluated, and how these techniques are currently used by researchers. This analysis can then be used to both inform on the current state of affairs, and to provide data to justify changes to this practice. This work aims to provide this by first developing a novel taxonomy of PCG evaluation approaches, and then presenting the results of a survey of recent work in the field through the lens of this taxonomy. The results of this survey highlight several important weaknesses in current practice which we argue could be substantially mitigated by 1) promoting use of evaluation free system descriptions where appropriate, 2) promoting the development of diverse research frameworks, 3) promoting reuse of code and methodology wherever possible.
翻译:程序化内容生成(PCG)系统在电子游戏关卡生成中的评估是一个复杂且具争议性的议题。理想情况下,该领域应具备稳健、可泛化且被广泛接受的评估方法,用以将新型PCG系统与既有研究进行比较,但目前关于如何评估新型系统的共识仍十分有限。我们认为,通过结构化的分析来探讨程序化关卡生成系统的评估方式及研究者当前采用的技术手段,能够为该领域带来助益。此类分析既可揭示当前研究现状,又能为改进实践提供数据支撑。本研究旨在实现这一目标:首先提出一种新颖的PCG评估方法分类体系,继而以此分类体系为视角,呈现对近年领域内研究成果的调查结果。该调查结果揭示了当前实践中若干重要缺陷,我们提出这些缺陷可通过以下措施显著缓解:1)在适当情形下推广免评估的系统描述方法;2)推动多样化研究框架的发展;3)尽可能促进代码与方法论的复用。