AI systems increasingly assist human decision making by producing preliminary assessments of complex inputs. However, such AI-generated assessments can often be noisy or systematically biased, raising a central question: how should costly human effort be allocated to correct AI outputs where it matters the most for the final decision? We propose a general decision-theoretic framework for human-AI collaboration in which AI assessments are treated as factor-level signals and human judgments as costly information that can be selectively acquired. We consider cases where the optimal selection problem reduces to maximizing a reward associated with each candidate subset of factors, and turn policy design into reward estimation. We develop estimation procedures under both nonparametric and linear models, covering contextual and non-contextual selection rules. In the linear setting, the optimal rule admits a closed-form expression with a clear interpretation in terms of factor importance and residual variance. We apply our framework to AI-assisted peer review. Our approach substantially outperforms LLM-only predictions and achieves performance comparable to full human review while using only 20-30% of the human information. Across different selection rules, we find that simpler rules derived under linear models can significantly reduce computational cost without harming final prediction performance. Our results highlight both the value of human intervention and the efficiency of principled dispatching.
翻译:人工智能系统通过生成复杂输入的初步评估,正日益辅助人类决策。然而,此类AI生成的评估常常存在噪声或系统性偏差,这引出了一个核心问题:应如何分配成本高昂的人力,以在那些对最终决策至关重要的环节修正AI的输出?我们提出了一个用于人机协作的通用决策理论框架,其中将AI评估视为因子层面的信号,而将人类判断视为可选择获取的有成本信息。我们考虑的情形是,最优选择问题可简化为最大化与每个候选因子子集相关联的奖励,从而将策略设计转化为奖励估计。我们开发了在非参数模型和线性模型下的估计程序,涵盖了上下文相关和非上下文相关的选择规则。在线性设定下,最优规则具有闭合形式的表达式,其在因子重要性和残差方差方面的解释清晰明了。我们将我们的框架应用于AI辅助的同行评审。我们的方法显著优于仅使用大语言模型的预测,并在仅使用20-30%人类信息的情况下,达到了与完全人工评审相当的性能。在不同的选择规则中,我们发现,基于线性模型推导出的更简单规则可以显著降低计算成本,而不会损害最终的预测性能。我们的结果既凸显了人类干预的价值,也体现了基于原则的调度的高效性。