Nonparametric Additive Value Functions: Interpretable Reinforcement Learning with an Application to Surgical Recovery

We propose a nonparametric additive model for estimating interpretable value functions in reinforcement learning. Learning effective adaptive clinical interventions that rely on digital phenotyping features is a major for concern medical practitioners. With respect to spine surgery, different post-operative recovery recommendations concerning patient mobilization can lead to significant variation in patient recovery. While reinforcement learning has achieved widespread success in domains such as games, recent methods heavily rely on black-box methods, such neural networks. Unfortunately, these methods hinder the ability of examining the contribution each feature makes in producing the final suggested decision. While such interpretations are easily provided in classical algorithms such as Least Squares Policy Iteration, basic linearity assumptions prevent learning higher-order flexible interactions between features. In this paper, we present a novel method that offers a flexible technique for estimating action-value functions without making explicit parametric assumptions regarding their additive functional form. This nonparametric estimation strategy relies on incorporating local kernel regression and basis expansion to obtain a sparse, additive representation of the action-value function. Under this approach, we are able to locally approximate the action-value function and retrieve the nonlinear, independent contribution of select features as well as joint feature pairs. We validate the proposed approach with a simulation study, and, in an application to spine disease, uncover recovery recommendations that are inline with related clinical knowledge.

翻译：本文提出一种用于估计强化学习可解释价值函数的非参数加性模型。学习依赖数字表型特征的有效自适应临床干预措施是医疗从业者关注的主要问题。就脊柱手术而言，关于患者术后活动康复的不同推荐方案可能导致患者康复效果显著差异。尽管强化学习在游戏等领域已取得广泛成功，但近期方法严重依赖神经网络等黑箱方法。遗憾的是，这些方法阻碍了检验每个特征对最终建议决策贡献的能力。虽然这种可解释性在最小二乘策略迭代等经典算法中易于实现，但基础的线性假设限制了学习特征间高阶灵活交互的能力。本文提出一种新方法，无需对动作价值函数的加性函数形式做出显式参数假设，即可通过灵活技术进行估计。该非参数估计策略结合局部核回归与基扩展，获得动作价值函数的稀疏加性表示。在此方法下，我们能够局部近似动作价值函数，提取选定特征及特征对组合的非线性独立贡献。我们通过仿真研究验证了所提方法，并在脊柱疾病应用中发现了与相关临床知识一致的康复推荐方案。