Dynamic diagnosis is desirable when medical tests are costly or time-consuming. In this work, we use reinforcement learning (RL) to find a dynamic policy that selects lab test panels sequentially based on previous observations, ensuring accurate testing at a low cost. Clinical diagnostic data are often highly imbalanced; therefore, we aim to maximize the $F_1$ score instead of the error rate. However, optimizing the non-concave $F_1$ score is not a classic RL problem, thus invalidates standard RL methods. To remedy this issue, we develop a reward shaping approach, leveraging properties of the $F_1$ score and duality of policy optimization, to provably find the set of all Pareto-optimal policies for budget-constrained $F_1$ score maximization. To handle the combinatorially complex state space, we propose a Semi-Model-based Deep Diagnosis Policy Optimization (SM-DDPO) framework that is compatible with end-to-end training and online learning. SM-DDPO is tested on diverse clinical tasks: ferritin abnormality detection, sepsis mortality prediction, and acute kidney injury diagnosis. Experiments with real-world data validate that SM-DDPO trains efficiently and identifies all Pareto-front solutions. Across all tasks, SM-DDPO is able to achieve state-of-the-art diagnosis accuracy (in some cases higher than conventional methods) with up to $85\%$ reduction in testing cost. The code is available at [https://github.com/Zheng321/Deep-Reinforcement-Learning-for-Cost-Effective-Medical-Diagnosis].
翻译:当医学检测成本高昂或耗时较长时,动态诊断方法尤为重要。本研究利用强化学习(RL)寻求一种动态策略,该策略能基于先前观测结果依次选择实验室检测组合,在保证准确性的同时降低检测成本。临床诊断数据通常呈现高度不平衡特性,因此我们以最大化$F_1$分数为目标,而非关注错误率。然而,优化非凹的$F_1$分数并非经典强化学习问题,导致标准RL方法失效。为解决此问题,我们提出一种奖励塑形方法,通过利用$F_1$分数的性质与策略优化的对偶性,可证明地在预算约束下找到所有帕累托最优策略集合。针对组合复杂的状态空间,我们提出一种支持端到端训练与在线学习的半模型深度诊断策略优化(SM-DDPO)框架。该框架在多种临床任务中进行了测试:铁蛋白异常检测、脓毒症死亡率预测及急性肾损伤诊断。基于真实数据的实验表明,SM-DDPO训练高效且能识别所有帕累托前沿解。在所有任务中,SM-DDPO在实现最先进诊断准确率(部分任务优于传统方法)的同时,将检测成本降低高达85%。代码开源地址:[https://github.com/Zheng321/Deep-Reinforcement-Learning-for-Cost-Effective-Medical-Diagnosis]。