Understanding the interpretation of machine learning (ML) models has been of paramount importance when making decisions with societal impacts such as transport control, financial activities, and medical diagnosis. While current model interpretation methodologies focus on using locally linear functions to approximate the models or creating self-explanatory models that give explanations to each input instance, they do not focus on model interpretation at the subpopulation level, which is the understanding of model interpretations across different subset aggregations in a dataset. To address the challenges of providing explanations of an ML model across the whole dataset, we propose SUBPLEX, a visual analytics system to help users understand black-box model explanations with subpopulation visual analysis. SUBPLEX is designed through an iterative design process with machine learning researchers to address three usage scenarios of real-life machine learning tasks: model debugging, feature selection, and bias detection. The system applies novel subpopulation analysis on ML model explanations and interactive visualization to explore the explanations on a dataset with different levels of granularity. Based on the system, we conduct user evaluation to assess how understanding the interpretation at a subpopulation level influences the sense-making process of interpreting ML models from a user's perspective. Our results suggest that by providing model explanations for different groups of data, SUBPLEX encourages users to generate more ingenious ideas to enrich the interpretations. It also helps users to acquire a tight integration between programming workflow and visual analytics workflow. Last but not least, we summarize the considerations observed in applying visualization to machine learning interpretations.
翻译:理解机器学习(ML)模型的解释在交通控制、金融活动和医疗诊断等具有社会影响的决策中至关重要。虽然当前模型解释方法侧重于使用局部线性函数来近似模型,或构建能够为每个输入实例提供解释的自解释模型,但它们并未聚焦于子群体层面的模型解释——即理解数据集中不同子集聚合下的模型解释。为了解决为整个数据集提供ML模型解释的挑战,我们提出SUBPLEX,一种通过子群体可视化分析帮助用户理解黑盒模型解释的可视分析系统。SUBPLEX通过机器学习研究人员的迭代设计过程,针对现实机器学习任务的三个使用场景:模型调试、特征选择和偏差检测。该系统在ML模型解释上应用新颖的子群体分析,并通过交互式可视化探索不同粒度层级的数据集解释。基于该系统,我们开展用户评估,以评估从用户视角理解子群体层级解释如何影响ML模型解释的意义建构过程。结果表明,通过为不同数据组提供模型解释,SUBPLEX鼓励用户产生更多创新性想法以丰富解释内容,同时帮助用户实现编程工作流与可视化分析工作流的紧密集成。最后,我们总结了在可视化应用于机器学习解释过程中观察到的关键考量。