Individual human decision-makers may benefit from different forms of support to improve decision outcomes. However, a key question is which form of support will lead to accurate decisions at a low cost. In this work, we propose learning a decision support policy that, for a given input, chooses which form of support, if any, to provide. We consider decision-makers for whom we have no prior information and formalize learning their respective policies as a multi-objective optimization problem that trades off accuracy and cost. Using techniques from stochastic contextual bandits, we propose $\texttt{THREAD}$, an online algorithm to personalize a decision support policy for each decision-maker, and devise a hyper-parameter tuning strategy to identify a cost-performance trade-off using simulated human behavior. We provide computational experiments to demonstrate the benefits of $\texttt{THREAD}$ compared to offline baselines. We then introduce $\texttt{Modiste}$, an interactive tool that provides $\texttt{THREAD}$ with an interface. We conduct human subject experiments to show how $\texttt{Modiste}$ learns policies personalized to each decision-maker and discuss the nuances of learning decision support policies online for real users.
翻译:个体人类决策者可能从不同形式的支持中受益以改善决策结果。然而,关键问题在于哪种支持形式能够以较低成本带来准确决策。本研究提出学习一种决策支持策略,该策略针对给定输入选择提供何种形式的支持(如有需要)。我们考虑缺乏先验信息的决策者,并将其各自策略的学习形式化为一个权衡准确性与成本的多目标优化问题。利用随机情境赌博机技术,我们提出$\texttt{THREAD}$——一种为每位决策者个性化决策支持策略的在线算法,并设计超参数调优策略,通过模拟人类行为识别成本-性能权衡。我们通过计算实验展示了$\texttt{THREAD}$相比离线基准方法的优势。随后引入交互式工具$\texttt{Modiste}$,其为$\texttt{THREAD}$提供操作界面。我们开展人类受试者实验,展示$\texttt{Modiste}$如何学习适合每位决策者的个性化策略,并讨论为真实用户在线学习决策支持策略的细微之处。