Model evaluations are central to understanding the safety, risks, and societal impacts of AI systems. While most real-world AI applications involve human-AI interaction, most current evaluations (e.g., common benchmarks) of AI models do not. Instead, they incorporate human factors in limited ways, assessing the safety of models in isolation, thereby falling short of capturing the complexity of human-model interactions. In this paper, we discuss and operationalize a definition of an emerging category of evaluations -- "human interaction evaluations" (HIEs) -- which focus on the assessment of human-model interactions or the process and the outcomes of humans using models. First, we argue that HIEs can be used to increase the validity of safety evaluations, assess direct human impact and interaction-specific harms, and guide future assessments of models' societal impact. Second, we propose a safety-focused HIE design framework -- containing a human-LLM interaction taxonomy -- with three stages: (1) identifying the risk or harm area, (2) characterizing the use context, and (3) choosing the evaluation parameters. Third, we apply our framework to two potential evaluations for overreliance and persuasion risks. Finally, we conclude with tangible recommendations for addressing concerns over costs, replicability, and unrepresentativeness of HIEs.
翻译:模型评估是理解人工智能系统安全性、风险及社会影响的核心。尽管大多数现实世界的人工智能应用都涉及人机交互,但目前大多数人工智能模型评估(例如常见基准测试)并未包含这一维度。相反,它们仅以有限的方式纳入人为因素,孤立地评估模型的安全性,因而未能捕捉人机交互的复杂性。本文讨论并具体阐述了一类新兴评估——“人机交互评估”(HIEs)——的定义,该类评估聚焦于评估人机交互过程或人类使用模型的过程与结果。首先,我们认为人机交互评估可用于提升安全性评估的效度、评估直接的人类影响及交互特有的危害,并指导未来对模型社会影响的评估。其次,我们提出了一个以安全为中心的人机交互评估设计框架——包含一个人-大语言模型交互分类法——该框架包含三个阶段:(1) 识别风险或危害领域,(2) 刻画使用情境,(3) 选择评估参数。第三,我们将该框架应用于针对过度依赖和说服风险的两种潜在评估。最后,我们提出了切实可行的建议,以应对关于人机交互评估的成本、可复现性及代表性不足的担忧。