A distribution shift can have fundamental consequences such as signaling a change in the operating environment or significantly reducing the accuracy of downstream models. Thus, understanding distribution shifts is critical for examining and hopefully mitigating the effect of such a shift. Most prior work focuses on merely detecting if a shift has occurred and assumes any detected shift can be understood and handled appropriately by a human operator. We hope to aid in these manual mitigation tasks by explaining the distribution shift using interpretable transportation maps from the original distribution to the shifted one. We derive our interpretable mappings from a relaxation of optimal transport, where the candidate mappings are restricted to a set of interpretable mappings. We then inspect multiple quintessential use-cases of distribution shift in real-world tabular, text, and image datasets to showcase how our explanatory mappings provide a better balance between detail and interpretability than baseline explanations by both visual inspection and our PercentExplained metric.
翻译:摘要:分布偏移可能引发根本性后果,例如指示运行环境的变化或显著降低下游模型的准确性。因此,理解分布偏移对于审视并有望缓解此类偏移的影响至关重要。以往研究大多仅聚焦于检测偏移是否发生,并假设任何检测到的偏移都能由人工操作员理解并妥善处理。我们希望通过从原始分布到偏移分布的可解释传输映射来解释分布偏移,从而辅助这些人工缓解任务。我们从最优传输的松弛方法中推导出可解释映射,其中候选映射被限制在一组可解释映射内。随后,我们考察了真实世界表格、文本及图像数据集中多个分布偏移的典型用例,通过视觉检查与提出的PercentExplained指标表明,相较于基线解释方法,我们的解释映射在细节性与可解释性之间实现了更优的平衡。