Artificial intelligence techniques are increasingly being applied to solve control problems, but often rely on black-box methods without transparent output generation. To improve the interpretability and transparency in control systems, models can be defined as white-box symbolic policies described by mathematical expressions. For better performance in partially observable and volatile environments, the symbolic policies are extended with memory represented by continuous-time latent variables, governed by differential equations. Genetic programming is used for optimisation, resulting in interpretable policies consisting of symbolic expressions. Our results show that symbolic policies with memory compare with black-box policies on a variety of control tasks. Furthermore, the benefit of the memory in symbolic policies is demonstrated on experiments where memory-less policies fall short. Overall, we present a method for evolving high-performing symbolic policies that offer interpretability and transparency, which lacks in black-box models.
翻译:人工智能技术正日益应用于解决控制问题,但通常依赖黑盒方法,缺乏透明的输出生成机制。为提高控制系统的可解释性与透明度,可将模型定义为由数学表达式描述的白盒符号策略。为在部分可观测与动态变化环境中获得更优性能,我们通过由微分方程控制的连续时间潜变量表示记忆,对符号策略进行扩展。采用遗传编程进行优化,最终得到由符号表达式构成的可解释策略。实验结果表明,在多种控制任务中,具备记忆的符号策略与黑盒策略性能相当。此外,在无记忆策略表现不佳的实验中,我们验证了符号策略中记忆机制的优越性。总体而言,本文提出了一种可演化高性能符号策略的方法,该策略兼具黑盒模型所缺乏的可解释性与透明度。