With the rapid advancement of machine learning models for NLP tasks, collecting high-fidelity labels from AI models is a realistic possibility. Firms now make AI available to customers via predictions as a service (PaaS). This includes PaaS products for healthcare. It is unclear whether these labels can be used for training a local model without expensive annotation checking by in-house experts. In this work, we propose a new framework for Human Correction of AI-Generated Labels (H-COAL). By ranking AI-generated outputs, one can selectively correct labels and approach gold standard performance (100% human labeling) with significantly less human effort. We show that correcting 5% of labels can close the AI-human performance gap by up to 64% relative improvement, and correcting 20% of labels can close the performance gap by up to 86% relative improvement.
翻译:随着用于自然语言处理任务的机器学习模型快速发展,从AI模型获取高保真标签已成为现实可能。目前企业通过预测即服务(PaaS)模式向客户提供AI服务,其中包含面向医疗健康领域的PaaS产品。目前尚不清楚这些标签是否能够在不依赖内部专家进行昂贵标注审查的情况下,用于训练本地模型。本研究提出了一种用于人工修正AI生成标签(H-COAL)的新框架。通过对AI生成输出进行排序,可选择性修正标签,并以显著减少的人工成本逼近黄金标准性能(100%人工标注)。实验表明,修正5%的标签即可将AI与人类性能差距缩小相对提升64%,而修正20%的标签可将性能差距缩小相对提升86%。