Model evaluations are central to understanding the safety, risks, and societal impacts of AI systems. While most real-world AI applications involve human-AI interaction, most current evaluations (e.g., common benchmarks) of AI models do not. Instead, they incorporate human factors in limited ways, assessing the safety of models in isolation, thereby falling short of capturing the complexity of human-model interactions. In this paper, we discuss and operationalize a definition of an emerging category of evaluations -- "human interaction evaluations" (HIEs) -- which focus on the assessment of human-model interactions or the process and the outcomes of humans using models. First, we argue that HIEs can be used to increase the validity of safety evaluations, assess direct human impact and interaction-specific harms, and guide future assessments of models' societal impact. Second, we propose a safety-focused HIE design framework -- containing a human-LLM interaction taxonomy -- with three stages: (1) identifying the risk or harm area, (2) characterizing the use context, and (3) choosing the evaluation parameters. Third, we apply our framework to two potential evaluations for overreliance and persuasion risks. Finally, we conclude with tangible recommendations for addressing concerns over costs, replicability, and unrepresentativeness of HIEs.
翻译:模型评估是理解AI系统安全性、风险及社会影响的核心。尽管大多数现实世界的AI应用涉及人机交互,但当前大多数AI模型评估(例如常见基准测试)并未包含这一维度。相反,它们仅以有限方式纳入人为因素,孤立地评估模型安全性,因而难以捕捉人机交互的复杂性。本文讨论并实践了一种新兴评估类别——“人机交互评估”(HIEs)的定义,该类评估聚焦于对人机交互过程或人类使用模型的过程及结果进行评估。首先,我们认为HIEs可用于提升安全性评估的效度,评估直接的人类影响及交互特异性危害,并指导未来对模型社会影响的评估。其次,我们提出了一个以安全为核心的人机交互评估设计框架——包含一个人-LLM交互分类体系——该框架包含三个阶段:(1)识别风险或危害领域,(2)刻画使用情境,(3)选择评估参数。再次,我们将该框架应用于对过度依赖风险与说服风险的两项潜在评估中。最后,我们针对人机交互评估在成本、可复现性及样本代表性方面的关切提出了具体建议。