As artificial intelligence (AI) systems play an increasingly prominent role in human decision-making, challenges surface in the realm of human-AI interactions. One challenge arises from the suboptimal AI policies due to the inadequate consideration of humans disregarding AI recommendations, as well as the need for AI to provide advice selectively when it is most pertinent. This paper presents a sequential decision-making model that (i) takes into account the human's adherence level (the probability that the human follows/rejects machine advice) and (ii) incorporates a defer option so that the machine can temporarily refrain from making advice. We provide learning algorithms that learn the optimal advice policy and make advice only at critical time stamps. Compared to problem-agnostic reinforcement learning algorithms, our specialized learning algorithms not only enjoy better theoretical convergence properties but also show strong empirical performance.
翻译:随着人工智能(AI)系统在人类决策中扮演日益突出的角色,人机交互领域的挑战逐渐显现。其中一个挑战源于AI策略的次优性:一方面未能充分考虑人类忽视AI建议的行为,另一方面AI需要在其最具相关性时选择性提供建议。本文提出一种序贯决策模型,该模型(i)考虑人类的依从程度(人类遵循/拒绝机器建议的概率),并(ii)纳入延迟决策选项,使机器能够暂时避免提供建议。我们提出了学习算法,用于学习最优建议策略并仅在关键时间节点提供建议。与问题无关的强化学习算法相比,我们的专用学习算法不仅享有更优的理论收敛性质,还展现出强大的实证表现。