Appropriate reliance on AI advice has become a central research theme in human-AI collaboration. Existing frameworks have focused exclusively on point predictions as AI advice. However, set-valued AI advice (e.g., discrete sets or continuous intervals) is increasingly being used to communicate uncertainty and improve human decision making. In this paper, we develop the first formal framework for measuring appropriate reliance on set-valued AI advice within the sequential judge-advisor paradigm, spanning both classification and regression tasks. For classification, we first introduce the dimensions that are necessary for evaluating set-valued AI advice. We then define two metrics: correct reliance rate on AI and correct reliance rate on self, which jointly characterize appropriate reliance in this setting. For regression, we introduce quantity of AI reliance and quality of AI reliance, which respectively measure whether a decision maker utilized the AI advice and whether their reliance helped them get closer to the ground truth relative to their initial estimate. Through the application of our framework, we demonstrate how these metrics capture important nuances in human-AI collaboration that existing measures overlook.
翻译:对人工智能建议的恰当依赖已成为人机协作领域的核心研究主题。现有框架仅关注点预测型人工智能建议。然而,集合型人工智能建议(如离散集合或连续区间)正越来越多地被用于传递不确定性并改善人类决策。本文首次在序贯裁判-顾问范式下,构建了适用于分类与回归任务的集合型人工智能建议恰当依赖的正式衡量框架。针对分类任务,我们首先提出评估集合型人工智能建议的必要维度,进而定义两个指标:对人工智能的正确依赖率与对自身的正确依赖率,二者共同刻画该场景下的恰当依赖特征。针对回归任务,我们引入人工智能依赖数量与依赖质量两个概念,分别衡量决策者是否利用了人工智能建议,以及这种依赖是否帮助其初始估计更接近真实值。通过应用该框架,我们证明了这些指标能够捕捉现有度量方法所忽视的人机协作重要细微差异。