Multimodal user interfaces increasingly combine speech, gesture, vision, gaze, touch, biosignals, and other sensor data. Recent toolkits from the past five years, such as Geno, Multisensor-Pipeline (MSP), ReactGenie, and EmoSync, aim to make it easier for developers to prototype such interfaces, while older work such as WAMI shows how early web-based multimodal systems were conceived. Yet the field still lacks a systematic and reusable way to compare what these toolkits actually support, how much implementation work they offload from developers, and which evaluation strategies are appropriate for them. This paper reframes an HCI seminar draft into a benchmarking framework paper for multimodal user interface toolkits. Rather than reporting completed empirical results, it proposes a structured benchmark based on document analysis, technical comparison, and a future developer-based evaluation. The framework is organized around three dimensions: modality coverage and interaction abstraction, developer experience and workflow, and experimental and integration support. The paper illustrates the framework through five representative toolkits: Geno, MSP, ReactGenie, WAMI, and EmoSync. The contribution is a reusable benchmark template that future researchers can instantiate with empirical measurements, developer studies, and additional multimodal toolkits.
翻译:多模态用户界面日益整合语音、手势、视觉、凝视、触摸、生物信号及其他传感器数据。过去五年间出现的Geno、Multisensor-Pipeline(MSP)、ReactGenie及EmoSync等新型工具包,旨在帮助开发者更便捷地构建此类界面原型;而WAMI等早期成果则展示了基于Web的多模态系统初始设计理念。然而,该领域仍缺乏系统化、可复用的方法来比较这些工具包的实际支持能力、为开发者减轻的编码工作量,以及适用的评估策略。本文通过重构HCI研讨会草案,提出了面向多模态用户界面工具包的基准测试框架论文。不同于报告已完成的实证结果,本文基于文档分析、技术比较及未来面向开发者的评估,提出结构化基准方案。该框架围绕三个维度构建:模态覆盖范围与交互抽象能力、开发者体验与工作流、实验支持与集成能力。本文通过Geno、MSP、ReactGenie、WAMI及EmoSync五个代表性工具包阐释该框架。研究贡献在于提供可复用的基准测试模板,未来研究者可通过实证测量、开发者研究及扩展多模态工具包对其进行实例化应用。