Real-world interpretability for neural networks is a tradeoff between three concerns: 1) it requires humans to trust the explanation approximation (e.g. post-hoc approaches), 2) it compromises the understandability of the explanation (e.g. automatically identified feature masks), and 3) it compromises the model performance (e.g. decision trees). These shortcomings are unacceptable for human-facing domains, like education, healthcare, or natural language, which require trustworthy explanations, actionable interpretations, and accurate predictions. In this work, we present InterpretCC (interpretable conditional computation), a family of interpretable-by-design neural networks that guarantee human-centric interpretability while maintaining comparable performance to state-of-the-art models by adaptively and sparsely activating features before prediction. We extend this idea into an interpretable mixture-of-experts model, that allows humans to specify topics of interest, discretely separates the feature space for each data point into topical subnetworks, and adaptively and sparsely activates these topical subnetworks. We demonstrate variations of the InterpretCC architecture for text and tabular data across several real-world benchmarks: six online education courses, news classification, breast cancer diagnosis, and review sentiment.
翻译:神经网络的真实世界可解释性需要在三个关注点之间权衡:1)人类需要信任解释近似(例如事后方法),2)损害了解释的可理解性(例如自动识别的特征掩码),3)降低了模型性能(例如决策树)。这些缺陷在面向人类的领域(如教育、医疗或自然语言处理)中不可接受,这些领域需要可信赖的解释、可操作的解释性以及准确的预测。本文提出InterpretCC(可解释条件式计算),这是一种天生可解释的神经网络家族,通过在预测前自适应且稀疏地激活特征,在保持与最先进模型相当性能的同时,保证以人类为中心的可解释性。我们将这一思想扩展为可解释的混合专家模型,允许人类指定感兴趣的主题,将每个数据点的特征空间离散划分为主题子网络,并自适应且稀疏地激活这些主题子网络。我们在多个真实世界基准测试中展示了适用于文本和表格数据的InterpretCC架构变体,涵盖六门在线教育课程、新闻分类、乳腺癌诊断和评论情感分析。