Are Metrics Enough? Guidelines for Communicating and Visualizing Predictive Models to Subject Matter Experts

Presenting a predictive model's performance is a communication bottleneck that threatens collaborations between data scientists and subject matter experts. Accuracy and error metrics alone fail to tell the whole story of a model - its risks, strengths, and limitations - making it difficult for subject matter experts to feel confident in their decision to use a model. As a result, models may fail in unexpected ways or go entirely unused, as subject matter experts disregard poorly presented models in favor of familiar, yet arguably substandard methods. In this paper, we describe an iterative study conducted with both subject matter experts and data scientists to understand the gaps in communication between these two groups. We find that, while the two groups share common goals of understanding the data and predictions of the model, friction can stem from unfamiliar terms, metrics, and visualizations - limiting the transfer of knowledge to SMEs and discouraging clarifying questions being asked during presentations. Based on our findings, we derive a set of communication guidelines that use visualization as a common medium for communicating the strengths and weaknesses of a model. We provide a demonstration of our guidelines in a regression modeling scenario and elicit feedback on their use from subject matter experts. From our demonstration, subject matter experts were more comfortable discussing a model's performance, more aware of the trade-offs for the presented model, and better equipped to assess the model's risks - ultimately informing and contextualizing the model's use beyond text and numbers.

翻译：在向数据科学家与领域专家之间的协作中，呈现预测模型的性能常成为沟通瓶颈，威胁合作成效。仅凭准确率和误差指标无法全面揭示模型的风险、优势与局限性，这使得领域专家难以对使用模型的决策树立信心。因此，模型可能以出乎意料的方式失效，或完全被搁置不用——领域专家会因模型呈现不佳而舍弃它，转而采用熟悉但可能存在缺陷的方法。本文通过一项与领域专家及数据科学家共同开展的迭代研究，剖析两组人群之间的沟通鸿沟。研究发现，尽管两组目标一致（即理解数据与模型预测），但术语、指标和可视化方式的陌生感会引发摩擦，阻碍知识向领域专家转移，并抑制演示期间提出澄清性问题。基于研究结果，我们提炼出一套沟通指南，以可视化作为共通媒介来传达模型的优势与短板。我们通过回归建模场景展示了该指南的应用，并收集了领域专家的使用反馈。演示结果显示，领域专家能更自如地讨论模型性能，更清晰地认识所呈现模型的权衡因素，并更有效地评估模型风险——最终超越文字与数字，对模型的用途形成更全面的认知与情境化理解。