Interest in reinforcement learning (RL) has recently surged due to the application of deep learning techniques, but these connectionist approaches are opaque compared with symbolic systems. Learning Classifier Systems (LCSs) are evolutionary machine learning systems that can be categorised as eXplainable AI (XAI) due to their rule-based nature. Michigan LCSs are commonly used in RL domains as the alternative Pittsburgh systems (e.g. SAMUEL) suffer from complex algorithmic design and high computational requirements; however they can produce more compact/interpretable solutions than Michigan systems. We aim to develop two novel Pittsburgh LCSs to address RL domains: PPL-DL and PPL-ST. The former acts as a "zeroth-level" system, and the latter revisits SAMUEL's core Monte Carlo learning mechanism for estimating rule strength. We compare our two Pittsburgh systems to the Michigan system XCS across deterministic and stochastic FrozenLake environments. Results show that PPL-ST performs on-par or better than PPL-DL and outperforms XCS in the presence of high levels of environmental uncertainty. Rulesets evolved by PPL-ST can achieve higher performance than those evolved by XCS, but in a more parsimonious and therefore more interpretable fashion, albeit with higher computational cost. This indicates that PPL-ST is an LCS well-suited to producing explainable policies in RL domains.
翻译:近年来,由于深度学习技术的应用,强化学习领域的研究兴趣激增,但相较于符号系统,这些连接主义方法具有不透明性。学习分类器系统作为一种进化机器学习系统,因其基于规则的特性可归类为可解释人工智能。在强化学习领域,密歇根式LCS更为常用,而替代性的匹兹堡系统(如SAMUEL)则因算法设计复杂且计算需求较高而受限;然而,匹兹堡系统能比密歇根系统生成更紧凑、更可解释的解决方案。我们旨在开发两种新型匹兹堡LCS以应对强化学习领域:PPL-DL和PPL-ST。前者作为"零级"系统,后者则重新审视了SAMUEL用于估计规则强度的核心蒙特卡洛学习机制。我们在确定性和随机性FrozenLake环境中将两种匹兹堡系统与密歇根系统XCS进行比较。结果表明,PPL-ST的表现与PPL-DL相当或更优,且在环境不确定性较高时优于XCS。PPL-ST进化得到的规则集能够比XCS实现更高性能,同时具有更简约(从而更可解释)的形式,尽管计算成本更高。这表明PPL-ST是一种适用于在强化学习领域生成可解释策略的LCS。