In human-centric tasks such as healthcare and education, the heterogeneity among patients and students necessitates personalized treatments and instructional interventions. While reinforcement learning (RL) has been utilized in those tasks, off-policy selection (OPS) is pivotal to close the loop by offline evaluating and selecting policies without online interactions, yet current OPS methods often overlook the heterogeneity among participants. Our work is centered on resolving a pivotal challenge in human-centric systems (HCSs): how to select a policy to deploy when a new participant joining the cohort, without having access to any prior offline data collected over the participant? We introduce First-Glance Off-Policy Selection (FPS), a novel approach that systematically addresses participant heterogeneity through sub-group segmentation and tailored OPS criteria to each sub-group. By grouping individuals with similar traits, FPS facilitates personalized policy selection aligned with unique characteristics of each participant or group of participants. FPS is evaluated via two important but challenging applications, intelligent tutoring systems and a healthcare application for sepsis treatment and intervention. FPS presents significant advancement in enhancing learning outcomes of students and in-hospital care outcomes.
翻译:在医疗保健和教育等人本任务中,患者和学生之间的异质性要求个性化的治疗方案和教学干预措施。虽然强化学习(RL)已被应用于这些任务,但离线策略选择(OPS)对于通过离线评估和选择策略而无需在线交互来完成闭环至关重要,然而当前的OPS方法往往忽视参与者之间的异质性。我们的工作聚焦于解决人本系统(HCSs)中的一个关键挑战:当新参与者加入队列时,如何在没有该参与者任何历史离线数据的情况下,选择一个策略进行部署?我们提出了首次观察离线策略选择(FPS),这是一种新颖的方法,它通过亚组划分和对每个亚组定制OPS标准,系统地处理参与者异质性。通过将具有相似特征的个体分组,FPS促进了与每个参与者或参与者群体的独特特征相一致的个性化策略选择。FPS通过两个重要但具有挑战性的应用进行评估:智能辅导系统以及针对脓毒症治疗和干预的医疗保健应用。FPS在提升学生学习成果和院内护理效果方面展现出显著进步。