The advancement of robots, particularly those functioning in complex human-centric environments, relies on control solutions that are driven by machine learning. Understanding how learning-based controllers make decisions is crucial since robots are often safety-critical systems. This urges a formal and quantitative understanding of the explanatory factors in the interpretability of robot learning. In this paper, we aim to study interpretability of compact neural policies through the lens of disentangled representation. We leverage decision trees to obtain factors of variation [1] for disentanglement in robot learning; these encapsulate skills, behaviors, or strategies toward solving tasks. To assess how well networks uncover the underlying task dynamics, we introduce interpretability metrics that measure disentanglement of learned neural dynamics from a concentration of decisions, mutual information and modularity perspective. We showcase the effectiveness of the connection between interpretability and disentanglement consistently across extensive experimental analysis.
翻译:机器人的发展,特别是那些在复杂人居环境中运作的机器人,依赖于机器学习驱动的控制解决方案。由于机器人通常是安全关键系统,理解基于学习的控制器如何做出决策至关重要。这促使我们对机器人学习可解释性中的解释因素进行形式化和定量化的理解。本文旨在通过解耦表示的视角,研究紧凑型神经策略的可解释性。我们利用决策树获取机器人学习中用于解耦的变异因子[1];这些因子涵盖了解决任务所需的技能、行为或策略。为评估网络揭示底层任务动态的能力,我们从决策集中度、互信息和模块性角度引入了测量所学神经动态解耦程度的可解释性指标。通过广泛的实验分析,我们一致地展示了可解释性与解耦之间关联的有效性。