Interpretable policy learning seeks to estimate intelligible decision policies from observed actions; however, existing models fall short by forcing a tradeoff between accuracy and interpretability. This tradeoff limits data-driven interpretations of human decision-making process. e.g. to audit medical decisions for biases and suboptimal practices, we require models of decision processes which provide concise descriptions of complex behaviors. Fundamentally, existing approaches are burdened by this tradeoff because they represent the underlying decision process as a universal policy, when in fact human decisions are dynamic and can change drastically with contextual information. Thus, we propose Contextualized Policy Recovery (CPR), which re-frames the problem of modeling complex decision processes as a multi-task learning problem in which complex decision policies are comprised of context-specific policies. CPR models each context-specific policy as a linear observation-to-action mapping, and generates new decision models $\textit{on-demand}$ as contexts are updated with new observations. CPR is compatible with fully offline and partially observable decision environments, and can be tailored to incorporate any recurrent black-box model or interpretable decision model. We assess CPR through studies on simulated and real data, achieving state-of-the-art performance on the canonical tasks of predicting antibiotic prescription in intensive care units ($+22\%$ AUROC vs. previous SOTA) and predicting MRI prescription for Alzheimer's patients ($+7.7\%$ AUROC vs. previous SOTA). With this improvement in predictive performance, CPR closes the accuracy gap between interpretable and black-box methods for policy learning, allowing high-resolution exploration and analysis of context-specific decision models.
翻译:可解释策略学习旨在从观测到的行为中估计可理解的决策策略;然而,现有模型因强制在准确性与可解释性之间进行权衡而存在不足。这种权衡限制了基于数据的对人类决策过程的解释——例如,为审计医疗决策中的偏差与次优实践,我们需要能够提供复杂行为简洁描述的决策过程模型。本质上,现有方法受困于这种权衡,是因为它们将底层决策过程表示为通用策略,而实际上人类决策是动态的,并会随情境信息发生剧烈变化。因此,我们提出情境化策略恢复(CPR),将复杂决策过程建模问题重新定义为多任务学习问题,其中复杂决策策略由特定情境策略组合而成。CPR将每个情境特定策略建模为线性观测-动作映射,并在情境随新观测更新时按需生成新的决策模型。CPR兼容完全离线与部分可观测的决策环境,并可定制以集成任意循环黑箱模型或可解释决策模型。我们通过模拟数据和真实数据评估CPR,在重症监护室抗生素处方预测(相较先前最优方法AUROC提升22%)和阿尔茨海默病患者MRI处方预测(相较先前最优方法AUROC提升7.7%)等经典任务中取得了最先进性能。凭借这一预测性能提升,CPR弥合了可解释方法与黑箱方法在策略学习中的精度差距,从而实现对情境特定决策模型的高分辨率探索与分析。