Deep neural two-sample tests have recently shown strong power for detecting distributional differences between groups, yet their black-box nature limits interpretability and practical adoption in biomedical analysis. Moreover, most existing post-hoc explainability methods rely on class labels, making them unsuitable for label-free statistical testing settings. We propose an explainable deep statistical testing framework that augments deep two-sample tests with sample-level and feature-level explanations, revealing which individual samples and which input features drive statistically significant group differences. Our method highlights which image regions and which individual samples contribute most to the detected group difference, providing spatial and instance-wise insight into the test's decision. Applied to biomedical imaging data, the proposed framework identifies influential samples and highlights anatomically meaningful regions associated with disease-related variation. This work bridges statistical inference and explainable AI, enabling interpretable, label-free population analysis in medical imaging.
翻译:深度神经双样本检验最近在检测群体间分布差异方面展现出强大功效,但其黑箱特性限制了在生物医学分析中的可解释性和实际应用。此外,现有的大多数事后可解释性方法依赖于类别标签,使其不适用于无标签的统计检验场景。我们提出了一种可解释的深度统计检验框架,通过样本级和特征级解释来增强深度双样本检验,揭示哪些个体样本和哪些输入特征驱动了统计显著的群体差异。我们的方法突出显示哪些图像区域和哪些个体样本对检测到的群体差异贡献最大,从而为检验决策提供空间和实例层面的洞察。应用于生物医学影像数据时,所提出的框架能够识别有影响力的样本,并突出显示与疾病相关变异相关的解剖学意义区域。这项工作连接了统计推断与可解释人工智能,实现了医学影像中可解释、无标签的群体分析。