Human doctors frequently recommend actionable recourses that allow patients to modify their conditions to access more effective treatments. Inspired by such healthcare scenarios, we propose the Recourse Linear UCB ($\textsf{RLinUCB}$) algorithm, which optimizes both action selection and feature modifications by balancing exploration and exploitation. We further extend this to the Human-AI Linear Recourse Bandit ($\textsf{HR-Bandit}$), which integrates human expertise to enhance performance. $\textsf{HR-Bandit}$ offers three key guarantees: (i) a warm-start guarantee for improved initial performance, (ii) a human-effort guarantee to minimize required human interactions, and (iii) a robustness guarantee that ensures sublinear regret even when human decisions are suboptimal. Empirical results, including a healthcare case study, validate its superior performance against existing benchmarks.
翻译:人类医生经常推荐可操作的追索方案,使患者能够调整自身状况以获得更有效的治疗。受此类医疗场景启发,我们提出了追索线性上置信界算法($\textsf{RLinUCB}$),该算法通过平衡探索与利用,同时优化行动选择和特征调整。我们进一步将其扩展为人类-人工智能线性追索老虎机算法($\textsf{HR-Bandit}$),该算法整合人类专业知识以提升性能。$\textsf{HR-Bandit}$ 提供三项关键保证:(一)热启动保证以改善初始性能,(二)人力投入保证以最小化所需的人工交互,(三)鲁棒性保证确保即使在人类决策次优时仍能实现次线性遗憾。包含医疗案例研究在内的实证结果验证了其相对于现有基准方法的优越性能。