Interpretability is often pointed out as a key requirement for trustworthy machine learning. However, learning and releasing models that are inherently interpretable leaks information regarding the underlying training data. As such disclosure may directly conflict with privacy, a precise quantification of the privacy impact of such breach is a fundamental problem. For instance, previous work have shown that the structure of a decision tree can be leveraged to build a probabilistic reconstruction of its training dataset, with the uncertainty of the reconstruction being a relevant metric for the information leak. In this paper, we propose of a novel framework generalizing these probabilistic reconstructions in the sense that it can handle other forms of interpretable models and more generic types of knowledge. In addition, we demonstrate that under realistic assumptions regarding the interpretable models' structure, the uncertainty of the reconstruction can be computed efficiently. Finally, we illustrate the applicability of our approach on both decision trees and rule lists, by comparing the theoretical information leak associated to either exact or heuristic learning algorithms. Our results suggest that optimal interpretable models are often more compact and leak less information regarding their training data than greedily-built ones, for a given accuracy level.
翻译:可解释性常被视为可信机器学习的关键要求。然而,学习并发布本质可解释的模型会泄露底层训练数据的信息。由于此类泄露可能直接违背隐私保护原则,精确量化这种隐私泄露的影响是一个基础性问题。例如,已有研究表明,决策树结构可被用于对其训练数据集进行概率重建,而重建的不确定性是衡量信息泄露的相关指标。本文提出了一种新颖框架,从广义上拓展了这些概率重建方法,使其能处理其他形式的可解释模型及更通用的知识类型。此外,我们证明在关于可解释模型结构的现实假设下,重建的不确定性可被高效计算。最后,通过比较精确学习算法与启发式学习算法相关的理论信息泄露,我们展示了该方法在决策树和规则列表上的适用性。结果表明,在给定准确率水平下,最优可解释模型通常比贪婪构建的模型更紧凑,且泄露更少的训练数据信息。