The emergence of Large Language Models (LLMs) has revealed a growing need for human-AI collaboration, especially in creative decision-making scenarios where trust and reliance are paramount. Through human studies and model evaluations on the open-ended News Headline Generation task from the LaMP benchmark, we analyze how the framing and presence of explanations affect user trust and model performance. Overall, we provide evidence that adding an explanation in the model response to justify its reasoning significantly increases self-reported user trust in the model when the user has the opportunity to compare various responses. Position and faithfulness of these explanations are also important factors. However, these gains disappear when users are shown responses independently, suggesting that humans trust all model responses, including deceptive ones, equitably when they are shown in isolation. Our findings urge future research to delve deeper into the nuanced evaluation of trust in human-machine teaming systems.
翻译:大型语言模型(LLM)的出现揭示了人机协作日益增长的需求,尤其是在信任与依赖至关重要的创造性决策场景中。通过在LaMP基准的开放式新闻标题生成任务上进行人工研究与模型评估,我们分析了解释的构建方式及其存在如何影响用户信任与模型性能。总体而言,我们提供的证据表明,当用户有机会比较多种响应时,在模型响应中添加解释以论证其推理过程,能显著提高用户自我报告的对模型的信任度。这些解释的位置与忠实性也是重要因素。然而,当用户独立查看响应时,这些增益会消失,这表明人类在单独呈现模型响应时,会平等地信任所有模型输出,包括具有欺骗性的响应。我们的研究结果呼吁未来研究更深入地探讨人机协同系统中信任的细致评估。