Computer Use Agents (CUAs) operate interfaces by pointing, clicking, and typing -- mirroring interactions of sighted users (SUs) who can thus monitor CUAs and share control. CUAs do not reflect interactions by blind and low-vision users (BLVUs) who use assistive technology (AT). BLVUs thus cannot easily collaborate with CUAs. To characterize the accessibility gap of CUAs, we present A11y-CUA, a dataset of BLVUs and SUs performing 60 everyday tasks with 40.4 hours and 158,325 events. Our dataset analysis reveals that our collected interaction traces quantitatively confirm distinct interaction styles between SU and BLVU groups (mouse- vs. keyboard-dominant) and demonstrate interaction diversity within each group (sequential vs. shortcut navigation for BLVUs). We then compare collected traces to state-of-the-art CUAs under default and AT conditions (keyboard-only, magnifier). The default CUA executed 78.3% of tasks successfully. But with the AT conditions, CUA's performance dropped to 41.67% and 28.3% with keyboard-only and magnifier conditions respectively, and did not reflect nuances of real AT use. With our open A11y-CUA dataset, we aim to promote collaborative and accessible CUAs for everyone.
翻译:计算机使用智能体(CUAs)通过指向、点击和键入操作界面——这模仿了视力正常用户(SUs)的交互方式,因此后者能够监控 CUAs 并共享控制权。然而,CUAs 并未反映盲人和低视力用户(BLVUs)使用辅助技术(AT)的交互行为。因此,BLVUs 难以与 CUAs 协作。为表征 CUAs 的可访问性差距,我们提出了 A11y-CUA 数据集,其中包含 BLVUs 和 SUs 执行 60 项日常任务的交互记录,总计 40.4 小时、158,325 个事件。我们的数据集分析表明,所收集的交互轨迹定量地证实了 SU 与 BLVU 群体之间(鼠标主导 vs. 键盘主导)以及各群体内部(BLVUs 的顺序导航 vs. 快捷键导航)存在显著的交互风格差异。随后,我们将收集的交互轨迹与最先进的 CUAs 在默认及辅助技术条件(仅键盘、屏幕放大器)下的表现进行比较。默认设置的 CUA 成功执行了 78.3% 的任务。但在辅助技术条件下,CUA 的性能分别下降至仅键盘条件下的 41.67% 和屏幕放大器条件下的 28.3%,且未能反映真实辅助技术使用的细微差异。通过开源 A11y-CUA 数据集,我们旨在推动开发适用于所有人的、支持协作且具备可访问性的 CUAs。