Model evaluations are central to understanding the safety, risks, and societal impacts of AI systems. While most real-world AI applications involve human-AI interaction, most current evaluations (e.g., common benchmarks) of AI models do not. Instead, they incorporate human factors in limited ways, assessing the safety of models in isolation, thereby falling short of capturing the complexity of human-model interactions. In this paper, we discuss and operationalize a definition of an emerging category of evaluations -- "human interaction evaluations" (HIEs) -- which focus on the assessment of human-model interactions or the process and the outcomes of humans using models. First, we argue that HIEs can be used to increase the validity of safety evaluations, assess direct human impact and interaction-specific harms, and guide future assessments of models' societal impact. Second, we propose a safety-focused HIE design framework -- containing a human-LLM interaction taxonomy -- with three stages: (1) identifying the risk or harm area, (2) characterizing the use context, and (3) choosing the evaluation parameters. Third, we apply our framework to two potential evaluations for overreliance and persuasion risks. Finally, we conclude with tangible recommendations for addressing concerns over costs, replicability, and unrepresentativeness of HIEs.
翻译:模型评估是理解AI系统的安全性、风险及社会影响的核心。尽管大多数实际AI应用涉及人机交互,但当前大多数AI模型评估(如常见基准测试)却未包含这一维度。它们在有限程度上纳入人类因素,孤立地评估模型安全性,因而未能捕捉人机交互的复杂性。本文讨论并操作化了一类新兴评估的定义——"人机交互评估"(HIEs),其核心是评估人机交互过程或人类使用模型的行为与结果。首先,我们论证HIEs可用于提升安全性评估的有效性,评估直接人类影响及交互特定危害,并为未来模型社会影响评估提供指引。其次,我们提出一个以安全为中心的HIE设计框架——包含人机交互分类体系——包含三个阶段:(1)识别风险或危害领域,(2)描述使用情境,(3)选择评估参数。第三,我们将该框架应用于过度依赖和说服风险的两个潜在评估案例。最后,我们提出切实可行的建议,以解决HIEs的成本、可复现性及代表性不足等问题。