In high-stakes domains like healthcare, users often expect that sharing personal information with machine learning systems will yield tangible benefits, such as more accurate diagnoses and clearer explanations of contributing factors. However, the validity of this assumption remains largely unexplored. We propose a unified framework to quantify how personalizing a model influences both prediction and explanation. We show that its impacts on prediction and explanation can diverge: a model may become more or less explainable even when prediction is unchanged. For practical settings, we study a standard hypothesis test for detecting personalization effects on demographic groups. We derive a finite-sample lower bound on its probability of error as a function of group sizes, number of personal attributes, and desired benefit from personalization. This provides actionable insights, such as which dataset characteristics are necessary to test an effect, or the maximum effect that can be tested given a dataset. We apply our framework to real-world tabular datasets using feature-attribution methods, uncovering scenarios where effects are fundamentally untestable due to the dataset statistics. Our results highlight the need for joint evaluation of prediction and explanation in personalized models and the importance of designing models and datasets with sufficient information for such evaluation.
翻译:在医疗等高风险领域,用户通常期望向机器学习系统提供个人信息能带来切实益处,例如更精确的诊断和更清晰的成因解释。然而,这一假设的有效性在很大程度上仍未得到验证。我们提出了一个统一框架,用于量化模型个性化如何同时影响预测性能和可解释性。研究表明,个性化对预测和解释的影响可能产生分化:即使预测性能保持不变,模型的可解释性也可能增强或减弱。针对实际应用场景,我们研究了一种用于检测人口统计群体个性化效应的标准假设检验方法。我们推导了该检验错误概率的有限样本下界,该下界是群体规模、个性化属性数量以及期望获益程度的函数。这为实践提供了可操作的洞见,例如检验特定效应所需的数据集特征条件,或给定数据集下可检验的最大效应强度。我们使用特征归因方法将所提框架应用于真实世界的表格数据集,揭示了因数据集统计特性导致效应本质上无法检验的若干场景。研究结果凸显了对个性化模型开展预测与解释联合评估的必要性,以及设计具备充分信息以支持此类评估的模型与数据集的重要性。