Active Learning (AL) is a human-in-the-loop framework to interactively and adaptively label data instances, thereby enabling significant gains in model performance compared to random sampling. AL approaches function by selecting the hardest instances to label, often relying on notions of diversity and uncertainty. However, we believe that these current paradigms of AL do not leverage the full potential of human interaction granted by automated label suggestions. Indeed, we show that for many classification tasks and datasets, most people verifying if an automatically suggested label is correct take $3\times$ to $4\times$ less time than they do changing an incorrect suggestion to the correct label (or labeling from scratch without any suggestion). Utilizing this result, we propose CLARIFIER (aCtive LeARnIng From tIEred haRdness), an Interactive Learning framework that admits more effective use of human interaction by leveraging the reduced cost of verification. By targeting the hard (uncertain) instances with existing AL methods, the intermediate instances with a novel label suggestion scheme using submodular mutual information functions on a per-class basis, and the easy (confident) instances with highest-confidence auto-labeling, CLARIFIER can improve over the performance of existing AL approaches on multiple datasets -- particularly on those that have a large number of classes -- by almost 1.5$\times$ to 2$\times$ in terms of relative labeling cost.
翻译:主动学习(AL)是一种人机协同框架,通过交互式自适应标注数据实例,相较于随机采样能显著提升模型性能。传统AL方法通过选择最难标注的实例(常基于多样性和不确定性概念)来运作。然而我们认为,当前AL范式未能充分利用自动化标签建议所赋予的人类交互潜力。事实上,我们发现在多数分类任务和数据集中,验证自动标签正确性所需的时间比修改错误标签(或无标签建议时从头标注)快3至4倍。基于此发现,我们提出CLARIFIER(基于层级难度的主动学习)框架——通过利用验证成本的降低,实现更高效的人类交互。通过将现有AL方法用于处理困难(不确定)实例,采用基于类别级子模互信息函数的新型标签建议方案处理中等难度实例,以及使用最高置信度自动标注处理简单(确信)实例,CLARIFIER能在多个数据集上(尤其是类别数量较多的数据集)将相对标注成本降低近1.5至2倍,从而超越现有AL方法的性能表现。