Dynamic diagnosis is desirable when medical tests are costly or time-consuming. In this work, we use reinforcement learning (RL) to find a dynamic policy that selects lab test panels sequentially based on previous observations, ensuring accurate testing at a low cost. Clinical diagnostic data are often highly imbalanced; therefore, we aim to maximize the $F_1$ score instead of the error rate. However, optimizing the non-concave $F_1$ score is not a classic RL problem, thus invalidates standard RL methods. To remedy this issue, we develop a reward shaping approach, leveraging properties of the $F_1$ score and duality of policy optimization, to provably find the set of all Pareto-optimal policies for budget-constrained $F_1$ score maximization. To handle the combinatorially complex state space, we propose a Semi-Model-based Deep Diagnosis Policy Optimization (SM-DDPO) framework that is compatible with end-to-end training and online learning. SM-DDPO is tested on diverse clinical tasks: ferritin abnormality detection, sepsis mortality prediction, and acute kidney injury diagnosis. Experiments with real-world data validate that SM-DDPO trains efficiently and identifies all Pareto-front solutions. Across all tasks, SM-DDPO is able to achieve state-of-the-art diagnosis accuracy (in some cases higher than conventional methods) with up to $85\%$ reduction in testing cost. The code is available at [https://github.com/Zheng321/Blood_Panel].
翻译:动态诊断在医疗测试成本高昂或耗时较长的情况下具有重要意义。本研究采用强化学习方法构建动态诊疗策略,该策略基于既往观测结果顺序选择实验室检测项目,在确保诊断准确性的同时降低检测成本。临床诊断数据通常呈现高度不平衡性,因此本文以最大化$F_1$分数而非错误率为优化目标。然而,优化非凹性的$F_1$分数不属于经典强化学习问题范畴,导致标准强化学习方法失效。为解决该问题,我们提出一种奖励塑形方法,利用$F_1$分数的特性与策略优化的对偶性,在理论上能够找到预算约束下最大化$F_1$分数的所有帕累托最优策略集。针对组合爆炸式状态空间,我们构建了支持端到端训练与在线学习的半模型深度诊断策略优化(SM-DDPO)框架。该框架在铁蛋白异常检测、脓毒症死亡预测及急性肾损伤诊断三项差异化临床任务中进行了验证。基于真实世界数据的实验结果表明,SM-DDPO能够高效训练并识别所有帕累托前沿解。在所有任务中,SM-DDPO在取得最先进诊断准确率(部分任务优于传统方法)的同时,最高可降低85%的检测成本。代码已开源至[https://github.com/Zheng321/Blood_Panel]。