Where Does My Model Underperform? A Human Evaluation of Slice Discovery Algorithms

Machine learning (ML) models that achieve high average accuracy can still underperform on semantically coherent subsets (i.e. "slices") of data. This behavior can have significant societal consequences for the safety or bias of the model in deployment, but identifying these underperforming slices can be difficult in practice, especially in domains where practitioners lack access to group annotations to define coherent subsets of their data. Motivated by these challenges, ML researchers have developed new slice discovery algorithms that aim to group together coherent and high-error subsets of data. However, there has been little evaluation focused on whether these tools help humans form correct hypotheses about where (for which groups) their model underperforms. We conduct a controlled user study (N = 15) where we show 40 slices output by two state-of-the-art slice discovery algorithms to users, and ask them to form hypotheses about where an object detection model underperforms. Our results provide positive evidence that these tools provide some benefit over a naive baseline, and also shed light on challenges faced by users during the hypothesis formation step. We conclude by discussing design opportunities for ML and HCI researchers. Our findings point to the importance of centering users when designing and evaluating new tools for slice discovery.

翻译：机器学习模型在取得高平均准确率的同时，仍可能在语义连贯的数据子集（即“切片”）上表现不佳。这种行为可能在部署中对模型的安全性或偏见产生重大社会影响，但在实践中识别这些表现不佳的切片十分困难，尤其是在从业者缺乏组标注来定义数据连贯子集的领域。受这些挑战的推动，机器学习研究人员开发了新的切片发现算法，旨在将连贯且误差较高的数据子集归为一组。然而，目前鲜有评估关注这些工具是否能够帮助人类对其模型在哪些群体上表现不佳形成正确假设。我们开展了一项受控用户研究（N=15），向用户展示两个最先进切片发现算法输出的40个切片，并要求他们形成关于目标检测模型在哪些方面表现不佳的假设。我们的结果提供了积极证据，表明这些工具相比朴素基线具有优势，同时也揭示了用户在假设形成步骤中面临的挑战。最后，我们讨论了机器学习与人机交互研究者的设计机遇。我们的研究结果强调了在设计和评估新的切片发现工具时，以用户为中心的重要性。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日