SUBPLEX: Towards a Better Understanding of Black Box Model Explanations at the Subpopulation Level

Understanding the interpretation of machine learning (ML) models has been of paramount importance when making decisions with societal impacts such as transport control, financial activities, and medical diagnosis. While current model interpretation methodologies focus on using locally linear functions to approximate the models or creating self-explanatory models that give explanations to each input instance, they do not focus on model interpretation at the subpopulation level, which is the understanding of model interpretations across different subset aggregations in a dataset. To address the challenges of providing explanations of an ML model across the whole dataset, we propose SUBPLEX, a visual analytics system to help users understand black-box model explanations with subpopulation visual analysis. SUBPLEX is designed through an iterative design process with machine learning researchers to address three usage scenarios of real-life machine learning tasks: model debugging, feature selection, and bias detection. The system applies novel subpopulation analysis on ML model explanations and interactive visualization to explore the explanations on a dataset with different levels of granularity. Based on the system, we conduct user evaluation to assess how understanding the interpretation at a subpopulation level influences the sense-making process of interpreting ML models from a user's perspective. Our results suggest that by providing model explanations for different groups of data, SUBPLEX encourages users to generate more ingenious ideas to enrich the interpretations. It also helps users to acquire a tight integration between programming workflow and visual analytics workflow. Last but not least, we summarize the considerations observed in applying visualization to machine learning interpretations.

翻译：理解机器学习（ML）模型的解释在交通控制、金融活动和医疗诊断等具有社会影响的决策中至关重要。虽然当前模型解释方法侧重于使用局部线性函数来近似模型，或构建能够为每个输入实例提供解释的自解释模型，但它们并未聚焦于子群体层面的模型解释——即理解数据集中不同子集聚合下的模型解释。为了解决为整个数据集提供ML模型解释的挑战，我们提出SUBPLEX，一种通过子群体可视化分析帮助用户理解黑盒模型解释的可视分析系统。SUBPLEX通过机器学习研究人员的迭代设计过程，针对现实机器学习任务的三个使用场景：模型调试、特征选择和偏差检测。该系统在ML模型解释上应用新颖的子群体分析，并通过交互式可视化探索不同粒度层级的数据集解释。基于该系统，我们开展用户评估，以评估从用户视角理解子群体层级解释如何影响ML模型解释的意义建构过程。结果表明，通过为不同数据组提供模型解释，SUBPLEX鼓励用户产生更多创新性想法以丰富解释内容，同时帮助用户实现编程工作流与可视化分析工作流的紧密集成。最后，我们总结了在可视化应用于机器学习解释过程中观察到的关键考量。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

【CHI2020-微软】解释可解释性:理解数据科学家使用机器学习的可解释性工具，Interpreting Interpretability: Understanding Data Scientists’Use of Interpretability Tools for Machine Learning

专知会员服务

55+阅读 · 2020年3月8日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日