Multiple Instance Learning (MIL) is a weakly supervised learning paradigm that is becoming increasingly popular because it requires less labeling effort than fully supervised methods. This is especially interesting for areas where the creation of large annotated datasets remains challenging, as in medicine. Although recent deep learning MIL approaches have obtained state-of-the-art results, they are fully deterministic and do not provide uncertainty estimations for the predictions. In this work, we introduce the Attention Gaussian Process (AGP) model, a novel probabilistic attention mechanism based on Gaussian Processes for deep MIL. AGP provides accurate bag-level predictions as well as instance-level explainability, and can be trained end-to-end. Moreover, its probabilistic nature guarantees robustness to overfitting on small datasets and uncertainty estimations for the predictions. The latter is especially important in medical applications, where decisions have a direct impact on the patient's health. The proposed model is validated experimentally as follows. First, its behavior is illustrated in two synthetic MIL experiments based on the well-known MNIST and CIFAR-10 datasets, respectively. Then, it is evaluated in three different real-world cancer detection experiments. AGP outperforms state-of-the-art MIL approaches, including deterministic deep learning ones. It shows a strong performance even on a small dataset with less than 100 labels and generalizes better than competing methods on an external test set. Moreover, we experimentally show that predictive uncertainty correlates with the risk of wrong predictions, and therefore it is a good indicator of reliability in practice. Our code is publicly available.
翻译:多实例学习(MIL)是一种弱监督学习范式,因其所需标注工作量少于全监督方法而日益流行。这一特性在医学等构建大规模标注数据集仍具挑战的领域尤为实用。尽管近年来的深度MIL方法已取得最优结果,但这些方法完全基于确定性模型,无法提供预测的不确定性估计。本文提出注意力高斯过程(AGP)模型——一种基于高斯过程的深度MIL新型概率注意力机制。AGP不仅能提供准确的包级预测与实例级可解释性,还可实现端到端训练。其概率特性赋予模型对过拟合的鲁棒性(尤其在小数据集上)及预测不确定性估计能力,后者在直接影响患者健康的医疗应用中尤为关键。我们通过以下实验验证该模型:首先基于经典MNIST和CIFAR-10数据集构建两个合成MIL实验进行行为分析,随后在三个不同真实癌症检测场景中评估模型性能。AGP在各项实验中均超越包括确定性深度学习方法在内的现有最优MIL模型。即使面对不足百个标签的小数据集,该模型仍表现出强劲性能,并在外部测试集上展现出优于对比方法的泛化能力。此外,实验证明预测不确定性与错误预测风险相关,可作为实际应用中的可靠性指标。我们的代码已公开。