Finding Counterfactually Optimal Action Sequences in Continuous State Spaces

Humans performing tasks that involve taking a series of multiple dependent actions over time often learn from experience by reflecting on specific cases and points in time, where different actions could have led to significantly better outcomes. While recent machine learning methods to retrospectively analyze sequential decision making processes promise to aid decision makers in identifying such cases, they have focused on environments with finitely many discrete states. However, in many practical applications, the state of the environment is inherently continuous in nature. In this paper, we aim to fill this gap. We start by formally characterizing a sequence of discrete actions and continuous states using finite horizon Markov decision processes and a broad class of bijective structural causal models. Building upon this characterization, we formalize the problem of finding counterfactually optimal action sequences and show that, in general, we cannot expect to solve it in polynomial time. Then, we develop a search method based on the $A^*$ algorithm that, under a natural form of Lipschitz continuity of the environment's dynamics, is guaranteed to return the optimal solution to the problem. Experiments on real clinical data show that our method is very efficient in practice, and it has the potential to offer interesting insights for sequential decision making tasks.

翻译：人类在执行涉及一系列随时间推移的多个依赖动作的任务时，常常通过反思具体案例和关键时间点来从经验中学习——即思考在不同动作下本可能取得显著更优结果的情形。尽管近期用于追溯分析序贯决策过程的机器学习方法有望帮助决策者识别此类情形，但这些方法主要聚焦于有限离散状态的环境。然而在许多实际应用中，环境状态本质上具有连续性。本文旨在填补这一空白。我们首先利用有限马尔可夫决策过程与一大类双射结构因果模型，对离散动作序列与连续状态进行形式化刻画。基于这一框架，我们形式化定义了"寻找反事实最优动作序列"问题，并证明该问题在一般情况下无法在多项式时间内求解。随后我们提出一种基于$A^*$算法的搜索方法，在环境动态满足自然形式的利普希茨连续性条件下，该方法能保证返回问题的最优解。基于真实临床数据的实验表明，该方法在实践中非常高效，有望为序贯决策任务提供有价值的见解。

相关内容

Continuity

关注 4

让 iOS 8 和 OS X Yosemite 无缝切换的一个新特性。 > Apple products have always been designed to work together beautifully. But now they may really surprise you. With iOS 8 and OS X Yosemite, you’ll be able to do more wonderful things than ever before.

Source: Apple - iOS 8

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日