Discovering Dynamic Symbolic Policies with Genetic Programming

Artificial intelligence (AI) techniques are increasingly being applied to solve control problems. However, control systems developed in AI are often black-box methods, in that it is not clear how and why they generate their outputs. A lack of transparency can be problematic for control tasks in particular, because it complicates the identification of biases or errors, which in turn negatively influences the user's confidence in the system. To improve the interpretability and transparency in control systems, the black-box structure can be replaced with white-box symbolic policies described by mathematical expressions. Genetic programming offers a gradient-free method to optimise the structure of non-differentiable mathematical expressions. In this paper, we show that genetic programming can be used to discover symbolic control systems. This is achieved by learning a symbolic representation of a function that transforms observations into control signals. We consider both systems that implement static control policies without memory and systems that implement dynamic memory-based control policies. In case of the latter, the discovered function becomes the state equation of a differential equation, which allows for evidence integration. Our results show that symbolic policies are discovered that perform comparably with black-box policies on a variety of control tasks. Furthermore, the additional value of the memory capacity in the dynamic policies is demonstrated on experiments where static policies fall short. Overall, we demonstrate that white-box symbolic policies can be optimised with genetic programming, while offering interpretability and transparency that lacks in black-box models.

翻译：人工智能技术正日益应用于解决控制问题。然而，人工智能领域开发的控制系统通常采用黑箱方法，其输出生成机制与原理往往不明确。这种透明度的缺失在控制任务中尤为突出，因为它使系统偏差或错误的识别变得复杂，进而影响用户对系统的信任度。为提高控制系统的可解释性与透明度，可采用数学表达式描述的白箱符号策略替代黑箱结构。遗传编程提供了一种无需梯度的优化方法，可用于优化不可微数学表达式的结构。本文论证了遗传编程可用于发现符号控制系统，其核心在于学习将观测值转换为控制信号的函数符号表示。我们同时考虑了实现无记忆静态控制策略的系统与实现基于记忆的动态控制策略的系统。对于后者，所发现的函数将成为微分方程的状态方程，从而实现证据整合。实验结果表明，在多种控制任务中，所发现的符号策略性能与黑箱策略相当。此外，通过静态策略失效的实验场景，我们验证了动态策略中记忆机制的附加价值。总体而言，本研究证明白箱符号策略可通过遗传编程进行优化，同时提供黑箱模型所缺乏的可解释性与透明度。