This project proposes and compares a new way to optimise Super Mario Bros. (SMB) environment where the control is in hand of two approaches, namely, Genetic Algorithm (MarioGA) and NeuroEvolution (MarioNE). Not only we learn playing SMB using these techniques, but also optimise it with constrains of collection of coins and finishing levels. Firstly, we formalise the SMB agent to maximize the total value of collected coins (reward) and maximising the total distance traveled (reward) in order to finish the level faster (time penalty) for both the algorithms. Secondly, we study MarioGA and its evaluation function (fitness criteria) including its representation methods, crossover used, mutation operator formalism, selection method used, MarioGA loop, and few other parameters. Thirdly, MarioNE is applied on SMB where a population of ANNs with random weights is generated, and these networks control Marios actions in the game. Fourth, SMB is further constrained to complete the task within the specified time, rebirths (deaths) within the limit, and performs actions or moves within the maximum allowed moves, while seeking to maximize the total coin value collected. This ensures an efficient way of finishing SMB levels. Finally, we provide a fivefold comparative analysis by plotting fitness plots, ability to finish different levels of world 1, and domain adaptation (transfer learning) of the trained models.
翻译:本项目提出并比较了一种在《超级马里奥兄弟》(SMB)环境中进行优化的新方法,该环境由两种方法控制,即遗传算法(MarioGA)和神经进化(MarioNE)。我们不仅利用这些技术学习玩SMB,还在收集金币和完成关卡的约束条件下对其进行优化。首先,我们为两种算法形式化SMB智能体,以最大化收集金币的总价值(奖励)和最大化总行进距离(奖励),从而更快地完成关卡(时间惩罚)。其次,我们研究MarioGA及其评估函数(适应度准则),包括其表示方法、所使用的交叉操作、变异算子形式化、选择方法、MarioGA循环以及其他一些参数。第三,将MarioNE应用于SMB,生成具有随机权重的ANN种群,这些网络控制游戏中的马里奥动作。第四,进一步约束SMB在指定时间内完成任务,限制重生(死亡)次数,并在允许的最大动作范围内执行动作或移动,同时寻求最大化收集的金币总价值。这确保了完成SMB关卡的有效方式。最后,我们通过绘制适应度图、完成世界1不同关卡的能力以及训练模型的领域适应(迁移学习),进行了五重比较分析。