Each year, thousands of patients in need of heart transplants face life-threatening wait times due to organ scarcity. While allocation policies aim to maximize population-level outcomes, current approaches often fail to account for the dynamic arrival of organs and the composition of waitlisted candidates, thereby hampering efficiency. The United States is transitioning from rigid, rule-based allocation to more flexible data-driven models. In this paper, we propose a novel framework for non-myopic policy optimization in general online matching relying on potentials, a concept originally introduced for kidney exchange. We develop scalable and accurate ways of learning potentials that are higher-dimensional and more expressive than prior approaches. Our approach is a form of self-supervised imitation learning: the potentials are trained to mimic an omniscient algorithm that has perfect foresight. We focus on the application of heart transplant allocation and demonstrate, using real historical data, that our policies significantly outperform prior approaches -- including the current US status quo policy and the proposed continuous distribution framework -- in optimizing for population-level outcomes. Our analysis and methods come at a pivotal moment in US policy, as the current heart transplant allocation system is under review. We propose a scalable and theoretically grounded path toward more effective organ allocation.
翻译:每年,数千名需要心脏移植的患者因器官短缺而面临危及生命的等待时间。虽然分配政策旨在最大化群体层面的治疗效果,但现有方法通常未能考虑器官的动态到达与等待名单候选人的组成特征,从而限制了分配效率。美国正从僵化的基于规则的分配体系转向更灵活的数据驱动模型。本文提出一种基于势能的新型非近视策略优化框架,适用于一般在线匹配问题;势能概念最初是为肾脏交换而提出的。我们开发了可扩展且精确的势能学习方法,所学习的势能具有比先前方法更高维度和更强表达能力的特性。我们的方法属于自监督模仿学习的一种形式:通过训练使势能模拟具有完全预见能力的全知算法。我们将重点放在心脏移植分配的应用场景,并利用真实历史数据证明:在优化群体层面治疗效果方面,我们的策略显著优于现有方法——包括当前美国现行政策及已提出的连续分布框架。当前美国心脏移植分配体系正处于审查阶段,我们的分析与方法的提出恰逢这一关键政策窗口期。我们为提升器官分配效率提供了一条可扩展且理论坚实的实施路径。