Capturing Requirements for a Data Annotation Tool for Intensive Care: Experimental User-Centered Design Study

Intensive care units (ICUs) are complex and data-rich environments. Data routinely collected in the ICUs provides tremendous opportunities for machine learning, but their use comes with significant challenges. Complex problems may require additional input from humans which can be provided through a process of data annotation. Annotation is a complex, time-consuming process that requires domain expertise and technical proficiency. Existing data annotation tools fail to provide an effective solution to this problem. In this study, we investigated clinicians' approach to the annotation task. We focused on establishing the characteristics of the annotation process in the context of clinical data and identifying differences in the annotation workflow between different staff roles. The overall goal was to elicit requirements for a software tool that could facilitate an effective and time-efficient data annotation. We conducted an experiment involving clinicians from the ICUs annotating printed sheets of data. The participants were observed during the task and their actions were analysed in the context of Norman's Interaction Cycle to establish the requirements for the digital tool. The annotation process followed a constant loop of annotation and evaluation, during which participants incrementally analysed and annotated the data. No distinguishable differences were identified between how different staff roles annotate data. We observed preferences towards different methods for applying annotation which varied between different participants and admissions. We established 11 requirements for the digital data annotation tool for the healthcare setting. We conducted a manual data annotation activity to establish the requirements for a digital data annotation tool, characterised the clinicians' approach to annotation and elicited 11 key requirements for effective data annotation software.

翻译：重症监护病房（ICU）是数据密集且高度复杂的临床环境。ICU常规采集的数据为机器学习提供了巨大机遇，但其利用仍面临重大挑战。复杂临床问题往往需要人类专家的补充性输入，这一过程可通过数据标注实现。标注工作本身具有复杂性、耗时性，且同时要求领域专业知识与技术操作能力。现有数据标注工具未能为此提供有效解决方案。本研究通过实验探究临床医师的标注行为模式，重点在于厘清临床数据标注过程的特征，并识别不同岗位人员在标注工作流中的差异。核心目标是提取能够促进高效、省时的数据标注软件工具的功能需求。我们设计了临床医师对纸质数据表单进行标注的实验，在任务执行过程中对参与者进行观察，并基于诺曼交互循环理论分析其行为模式，从而确立数字化工具的需求框架。标注过程呈现持续循环的标注-评估模式，参与者通过渐进式分析逐步完成数据标注。研究发现不同岗位人员在数据标注方式上未呈现显著差异，但观察到标注方法的选择偏好存在个体间及病例间的差异性。最终我们确立了医疗场景下数字化数据标注工具的11项核心需求。通过人工数据标注实验，本研究不仅构建了数字化标注工具的需求体系，还系统刻画了临床医师的标注行为特征，最终提炼出高效数据标注软件的11项关键需求。