Towards Optimizing Human-Centric Objectives in AI-Assisted Decision-Making With Offline Reinforcement Learning

Imagine if AI decision-support tools not only complemented our ability to make accurate decisions, but also improved our skills, boosted collaboration, and elevated the joy we derive from our tasks. Despite the potential to optimize a broad spectrum of such human-centric objectives, the design of current AI tools remains focused on decision accuracy alone. We propose offline reinforcement learning (RL) as a general approach for modeling human-AI decision-making to optimize human-AI interaction for diverse objectives. RL can optimize such objectives by tailoring decision support, providing the right type of assistance to the right person at the right time. We instantiated our approach with two objectives: human-AI accuracy on the decision-making task and human learning about the task and learned decision support policies from previous human-AI interaction data. We compared the optimized policies against several baselines in AI-assisted decision-making. Across two experiments (N=316 and N=964), our results demonstrated that people interacting with policies optimized for accuracy achieve significantly better accuracy -- and even human-AI complementarity -- compared to those interacting with any other type of AI support. Our results further indicated that human learning was more difficult to optimize than accuracy, with participants who interacted with learning-optimized policies showing significant learning improvement only at times. Our research (1) demonstrates offline RL to be a promising approach to model human-AI decision-making, leading to policies that may optimize human-centric objectives and provide novel insights about the AI-assisted decision-making space, and (2) emphasizes the importance of considering human-centric objectives beyond decision accuracy in AI-assisted decision-making, opening up the novel research challenge of optimizing human-AI interaction for such objectives.

翻译：设想一下，如果人工智能决策支持工具不仅能提升我们做出准确决策的能力，还能改善我们的技能、促进协作、并增强我们从任务中获得的愉悦感。尽管存在优化此类广泛以人为中心目标的潜力，但当前人工智能工具的设计仍仅专注于决策准确性。我们提出将离线强化学习作为一种通用方法来建模人机协同决策，从而针对多样化目标优化人机交互。强化学习能够通过定制决策支持，在正确的时间向正确的人提供恰当类型的辅助来优化这些目标。我们以两个目标实例化了该方法：人机在决策任务上的准确率以及人类对任务的学习能力，并从以往的人机交互数据中学习决策支持策略。我们将优化后的策略与人工智能辅助决策中的多个基线方法进行了比较。通过两项实验（样本量分别为N=316和N=964），我们的结果表明：与任何其他类型的人工智能支持交互相比，与为优化准确率而设计的策略进行交互的人能够实现显著更高的准确率——甚至实现人机互补。我们的结果进一步表明，学习能力的优化比准确率更难实现，与学习优化策略交互的参与者仅在部分时间点表现出显著的学习改善。本研究（1）证明了离线强化学习是建模人机协同决策的一种有前景的方法，能够生成优化以人为中心目标的策略，并为人工智能辅助决策领域提供新见解；（2）强调了在人工智能辅助决策中考虑超越决策准确性的以人为中心目标的重要性，为优化此类目标的人机交互开辟了新的研究挑战。