Reinforcement learning (RL) is experiencing a resurgence in research interest, where Learning Classifier Systems (LCSs) have been applied for many years. However, traditional Michigan approaches tend to evolve large rule bases that are difficult to interpret or scale to domains beyond standard mazes. A Pittsburgh Genetic Fuzzy System (dubbed Fuzzy MoCoCo) is proposed that utilises both multiobjective and cooperative coevolutionary mechanisms to evolve fuzzy rule-based policies for RL environments. Multiobjectivity in the system is concerned with policy performance vs. complexity. The continuous state RL environment Mountain Car is used as a testing bed for the proposed system. Results show the system is able to effectively explore the trade-off between policy performance and complexity, and learn interpretable, high-performing policies that use as few rules as possible.
翻译:强化学习(RL)正经历研究兴趣的复兴,而学习分类器系统(LCSs)在该领域已有多年应用。然而,传统的密歇根方法往往会演化出规模庞大的规则库,这些规则库难以解释,且难以扩展到标准迷宫之外的应用领域。本文提出了一种匹兹堡遗传模糊系统(称为Fuzzy MoCoCo),它利用多目标和协同进化机制,为强化学习环境演化出基于模糊规则的策略。系统中的多目标性关注策略性能与复杂性之间的权衡。连续状态的强化学习环境“山地车”被用作所提出系统的测试平台。结果表明,该系统能够有效探索策略性能与复杂性之间的权衡,并学习出可解释、高性能且尽可能少使用规则数量的策略。