As artificial intelligence (AI) systems play an increasingly prominent role in human decision-making, challenges surface in the realm of human-AI interactions. One challenge arises from the suboptimal AI policies due to the inadequate consideration of humans disregarding AI recommendations, as well as the need for AI to provide advice selectively when it is most pertinent. This paper presents a sequential decision-making model that (i) takes into account the human's adherence level (the probability that the human follows/rejects machine advice) and (ii) incorporates a defer option so that the machine can temporarily refrain from making advice. We provide learning algorithms that learn the optimal advice policy and make advice only at critical time stamps. Compared to problem-agnostic reinforcement learning algorithms, our specialized learning algorithms not only enjoy better theoretical convergence properties but also show strong empirical performance.
翻译:随着人工智能(AI)系统在人类决策中扮演日益重要的角色,人机交互领域面临诸多挑战。其中一个挑战源于AI策略因未能充分考虑人类对AI建议的忽视而表现欠优,同时AI需在最具相关性时选择性提供建议。本文提出一种序列决策模型,该模型(i)考虑人类的遵从水平(人类遵循/拒绝机器建议的概率),(ii)引入延迟选项,使机器可暂时不提供建议。我们提供了学习算法,用于学习最优建议策略,并仅在关键时间点提供建议。相较于问题无关的强化学习算法,我们的专业化学习算法不仅具有更优的理论收敛性质,还展现出强大的实证性能。