Large language models (LLMs) have emerged as powerful tools in various domains. Recent studies have shown that LLMs can surpass humans in certain tasks, such as predicting the outcomes of neuroscience studies. What role does this leave for humans in the overall decision process? One possibility is that humans, despite performing worse than LLMs, can still add value when teamed with them. A human and machine team can surpass each individual teammate when team members' confidence is well-calibrated and team members diverge in which tasks they find difficult (i.e., calibration and diversity are needed). We simplified and extended a Bayesian approach to combining judgments using a logistic regression framework that integrates confidence-weighted judgments for any number of team members. Using this straightforward method, we demonstrated in a neuroscience forecasting task that, even when humans were inferior to LLMs, their combination with one or more LLMs consistently improved team performance. Our hope is that this simple and effective strategy for integrating the judgments of humans and machines will lead to productive collaborations.
翻译:大型语言模型(LLM)已成为各领域中的强大工具。近期研究表明,LLM在某些任务中能够超越人类,例如预测神经科学研究的结果。这使人类在整个决策过程中处于何种角色?一种可能性是,尽管人类表现逊于LLM,但在与LLM协作时仍能创造附加价值。当团队成员具有良好校准的置信度且对任务难度的认知存在差异时(即需要校准性与多样性),人机协作团队的表现能够超越任何单一成员。我们通过逻辑回归框架简化并扩展了一种贝叶斯方法,用于融合任意数量团队成员的置信度加权判断。在神经科学预测任务中,运用这一简洁方法,我们证明了即使人类表现不及LLM,将其与一个或多个LLM结合仍能持续提升团队绩效。我们期望这种融合人机判断的简单有效策略能够促成富有成效的协作。