As artificial intelligence (AI) systems play an increasingly prominent role in human decision-making, challenges surface in the realm of human-AI interactions. One challenge arises from the suboptimal AI policies due to the inadequate consideration of humans disregarding AI recommendations, as well as the need for AI to provide advice selectively when it is most pertinent. This paper presents a sequential decision-making model that (i) takes into account the human's adherence level (the probability that the human follows/rejects machine advice) and (ii) incorporates a defer option so that the machine can temporarily refrain from making advice. We provide learning algorithms that learn the optimal advice policy and make advice only at critical time stamps. Compared to problem-agnostic reinforcement learning algorithms, our specialized learning algorithms not only enjoy better theoretical convergence properties but also show strong empirical performance.
翻译:随着人工智能系统在人类决策中扮演日益重要的角色,人机交互领域涌现出诸多挑战。其中一个挑战源于人工智能策略的次优性——这既包含对人类可能忽视人工智能建议的考虑不足,也涉及人工智能在关键时刻选择性提供建议的需求。本文提出一种序贯决策模型,该模型(i)考虑了人类的依从水平(人类遵循/拒绝机器建议的概率),同时(ii)引入暂缓选项,使机器可临时免于提供建议。我们提供可学习最优建议策略的学习算法,并仅在关键时间节点提出建议。相较于问题无关的强化学习算法,我们的专用学习算法不仅具有更优的理论收敛性质,更展现出强劲的实证表现。