Without well-labeled ground truth data, machine learning-based systems would not be as ubiquitous as they are today, but these systems rely on substantial amounts of correctly labeled data. Unfortunately, crowdsourced labeling is time consuming and expensive. To address the concerns of effort and tedium, we designed CAL, a novel interface to aid in data labeling. We made several key design decisions for CAL, which include preventing inapt labels from being selected, guiding users in selecting an appropriate label when they need assistance, incorporating labeling documentation into the interface, and providing an efficient means to view previous labels. We implemented a production-quality implementation of CAL and report a user-study evaluation that compares CAL to a standard spreadsheet. Key findings of our study include users using CAL reported lower cognitive load, did not increase task time, users rated CAL to be easier to use, and users preferred CAL over the spreadsheet.
翻译:没有良好标注的 ground truth 数据,基于机器学习的系统就不会像今天这样无处不在,但这些系统依赖于大量正确标注的数据。不幸的是,众包标注既耗时又昂贵。为了解决劳动强度和枯燥乏味的问题,我们设计了 CAL,一种新颖的辅助数据标注界面。我们对 CAL 做出了几项关键设计决策,包括防止选择不恰当的标签、在用户需要帮助时引导其选择合适的标签、将标注文档集成到界面中,以及提供查看先前标签的高效方式。我们实现了 CAL 的生产级实现,并报告了一项将 CAL 与标准电子表格进行比较的用户研究评估。我们研究的关键发现包括:使用 CAL 的用户报告了更低的认知负荷,且未增加任务时间;用户认为 CAL 更易用;并且用户更偏好 CAL 而非电子表格。