Human Factors, Cognitive Engineering, and Human-Automation Interaction (HAI) form a trifecta, where users and technological systems of ever increasing autonomous control occupy a centre position. But with great autonomy comes great responsibility. It is in this context that we propose metrics and a benchmark framework based on known regimes in Artificial Intelligence (AI). A benchmark is a set of tests and metrics or measurements conducted on those tests or tasks. We hypothesise about possible tasks designed to assess operator-system interactions and both the front-end and back-end components of HAI applications. Here, front-end pertains to the user interface and direct interactions the user has with a system, while the back-end is composed of the underlying processes and mechanisms that support the front-end experience. By evaluating HAI systems through the proposed metrics, based on Cognitive Engineering studies of judgment and prediction, we attempt to unify many known taxonomies and design guidelines for HAI systems in a benchmark. This is facilitated by providing a structured approach to quantifying the efficacy and reliability of these systems in a formal way inspired by the recent fast developments in AI benchmarking techniques, thus, we attempt to guide designing principles towards a testable benchmark capable of reproducible results that is future-proof, general, and insightful both in the cognitive and technological stacks of any HAI application.
翻译:人为因素、认知工程学与人机交互(HAI)构成一个三元体系,其中用户与自主控制日益增强的技术系统处于核心位置。然而,高度的自主性伴随着重大责任。在此背景下,我们基于人工智能(AI)领域的已知范式,提出了一套度量标准与基准框架。基准是指对特定测试或任务进行的一系列测试与度量。我们针对评估操作者-系统交互以及HAI应用前端与后端组件的可能任务提出假设性设计。此处,前端指用户界面及用户与系统的直接交互,后端则由支撑前端体验的底层处理流程与机制构成。通过基于认知工程学中判断与预测研究提出的度量标准来评估HAI系统,我们尝试将众多已知的HAI系统分类体系与设计准则整合于统一基准中。这一目标通过提供结构化方法得以推进,该方法借鉴近期AI基准测试技术的快速发展,以形式化方式量化此类系统的效能与可靠性。由此,我们试图将设计原则导向一个具备可复现性、面向未来、通用性强且能深入洞察任何HAI应用认知层与技术层的可测试基准。