Model evaluations are central to understanding the safety, risks, and societal impacts of AI systems. While most real-world AI applications involve human-AI interaction, most current evaluations (e.g., common benchmarks) of AI models do not. Instead, they incorporate human factors in limited ways, assessing the safety of models in isolation, thereby falling short of capturing the complexity of human-model interactions. In this paper, we discuss and operationalize a definition of an emerging category of evaluations -- "human interaction evaluations" (HIEs) -- which focus on the assessment of human-model interactions or the process and the outcomes of humans using models. First, we argue that HIEs can be used to increase the validity of safety evaluations, assess direct human impact and interaction-specific harms, and guide future assessments of models' societal impact. Second, we propose a safety-focused HIE design framework -- containing a human-LLM interaction taxonomy -- with three stages: (1) identifying the risk or harm area, (2) characterizing the use context, and (3) choosing the evaluation parameters. Third, we apply our framework to two potential evaluations for overreliance and persuasion risks. Finally, we conclude with tangible recommendations for addressing concerns over costs, replicability, and unrepresentativeness of HIEs.
翻译:模型评估对于理解人工智能系统的安全性、风险及社会影响至关重要。尽管大多数现实世界的人工智能应用涉及人机交互,但目前大多数AI模型评估(例如常见基准测试)却并非如此。相反,它们仅以有限的方式纳入人为因素,孤立地评估模型安全性,因而未能捕捉人机交互的复杂性。本文讨论并实践了一种新兴评估类别——“人机交互评估”的定义,该类评估聚焦于评估人机交互过程或人类使用模型的流程与结果。首先,我们认为人机交互评估可用于提升安全性评估的效度,评估直接人为影响及交互特异性危害,并指导未来对模型社会影响的评估。其次,我们提出了一个以安全为核心的人机交互评估设计框架——包含一个人-LLM交互分类体系——该框架包含三个阶段:(1) 识别风险或危害领域,(2) 刻画使用情境,(3) 选择评估参数。第三,我们将该框架应用于针对过度依赖与说服风险的两项潜在评估。最后,我们针对人机交互评估在成本、可复现性及代表性不足方面的关切,提出了具体可行的建议。