In this review, we examine the problem of designing interpretable and explainable machine learning models. Interpretability and explainability lie at the core of many machine learning and statistical applications in medicine, economics, law, and natural sciences. Although interpretability and explainability have escaped a clear universal definition, many techniques motivated by these properties have been developed over the recent 30 years with the focus currently shifting towards deep learning methods. In this review, we emphasise the divide between interpretability and explainability and illustrate these two different research directions with concrete examples of the state-of-the-art. The review is intended for a general machine learning audience with interest in exploring the problems of interpretation and explanation beyond logistic regression or random forest variable importance. This work is not an exhaustive literature survey, but rather a primer focusing selectively on certain lines of research which the authors found interesting or informative.
翻译:本文综述了设计可解释与可阐明机器学习模型的问题。可解释性与可阐明性位于医学、经济学、法学及自然科学中众多机器学习与统计应用的核心。尽管可解释性与可阐明性尚未形成清晰普适的定义,但受这些特性启发的诸多技术在过去30年间已得到发展,当前研究重点正转向深度学习方法。本文强调可解释性与可阐明性之间的分野,并通过前沿方法的具体实例阐明这两种不同的研究方向。本综述面向具有机器学习通识背景、且对超越逻辑回归或随机森林变量重要性的解释与阐明问题感兴趣的读者。本文并非详尽的文献综述,而是侧重于作者认为有趣或具有启发性的特定研究路线的基础性导读。