In supervised learning, models are trained to extract correlations from a static dataset. This often leads to models that rely on high-level misconceptions. To prevent such misconceptions, we must necessarily provide additional information beyond the training data. Existing methods incorporate forms of additional instance-level supervision, such as labels for spurious features or additional labeled data from a balanced distribution. Such strategies can become prohibitively costly for large-scale datasets since they require additional annotation at a scale close to the original training data. We hypothesize that targeted natural language feedback about a model's misconceptions is a more efficient form of additional supervision. We introduce Clarify, a novel interface and method for interactively correcting model misconceptions. Through Clarify, users need only provide a short text description to describe a model's consistent failure patterns. Then, in an entirely automated way, we use such descriptions to improve the training process by reweighting the training data or gathering additional targeted data. Our user studies show that non-expert users can successfully describe model misconceptions via Clarify, improving worst-group accuracy by an average of 17.1% in two datasets. Additionally, we use Clarify to find and rectify 31 novel hard subpopulations in the ImageNet dataset, improving minority-split accuracy from 21.1% to 28.7%.
翻译:在监督学习中,模型从静态数据集中学习提取相关性,这常导致模型依赖高层次的错误概念。为防止此类错误概念,我们必须提供训练数据之外的额外信息。现有方法通过引入实例级监督(如伪特征标注或均衡分布的额外标注数据)来弥补,但这类策略在大规模数据集上成本过高,因其所需标注规模接近原始训练数据。我们提出假设:针对模型错误概念的目标性自然语言反馈是一种更高效的额外监督形式。为此,我们设计了Clarify——一种交互式修正模型错误概念的新型接口与方法。用户仅需通过Clarify提供简短文本描述,说明模型持续出现的错误模式,系统即可全自动利用此类描述,通过重新加权训练数据或收集额外目标数据来优化训练过程。用户研究表明,非专业用户可通过Clarify成功描述模型错误概念,在两个数据集上将最差组平均准确率提升17.1%。此外,我们利用Clarify在ImageNet数据集中发现并修正了31个新型困难子群,将少数群体准确率从21.1%提升至28.7%。