Visualisations drive all aspects of the Machine Learning (ML) Development Cycle but remain a vastly untapped resource by the research community. ML testing is a highly interactive and cognitive process which demands a human-in-the-loop approach. Besides writing tests for the code base, bulk of the evaluation requires application of domain expertise to generate and interpret visualisations. To gain a deeper insight into the process of testing ML systems, we propose to study visualisations of ML pipelines by mining Jupyter notebooks. We propose a two prong approach in conducting the analysis. First, gather general insights and trends using a qualitative study of a smaller sample of notebooks. And then use the knowledge gained from the qualitative study to design an empirical study using a larger sample of notebooks. Computational notebooks provide a rich source of information in three formats -- text, code and images. We hope to utilise existing work in image analysis and Natural Language Processing for text and code, to analyse the information present in notebooks. We hope to gain a new perspective into program comprehension and debugging in the context of ML testing.
翻译:可视化驱动着机器学习(ML)开发周期的各个方面,但研究界仍远未充分挖掘其潜力。ML测试是一个高度交互且认知密集型过程,需要人机协同。除编写代码库测试外,大部分评估工作需要运用领域专业知识来生成和解读可视化结果。为深入理解ML系统的测试过程,我们提议通过挖掘Jupyter笔记本研究ML管道的可视化。我们提出双管齐下的分析方法:首先,通过定性分析较小样本笔记本收集总体见解和趋势;其次,利用定性研究所得知识,设计针对更大样本笔记本的实证研究。计算笔记本以三种形式(文本、代码和图像)提供丰富的信息源。我们期望通过整合图像分析与自然语言处理在文本及代码方面的现有研究成果,分析笔记本中的信息。希望能在ML测试的语境中,为程序理解与调试提供全新视角。