There is a growing interest in single-class modelling and out-of-distribution detection as fully supervised machine learning models cannot reliably identify classes not included in their training. The long tail of infinitely many out-of-distribution classes in real-world scenarios, e.g., for screening, triage, and quality control, means that it is often necessary to train single-class models that represent an expected feature distribution, e.g., from only strictly healthy volunteer data. Conventional supervised machine learning would require the collection of datasets that contain enough samples of all possible diseases in every imaging modality, which is not realistic. Self-supervised learning methods with synthetic anomalies are currently amongst the most promising approaches, alongside generative auto-encoders that analyse the residual reconstruction error. However, all methods suffer from a lack of structured validation, which makes calibration for deployment difficult and dataset-dependant. Our method alleviates this by making use of multiple visually-distinct synthetic anomaly learning tasks for both training and validation. This enables more robust training and generalisation. With our approach we can readily outperform state-of-the-art methods, which we demonstrate on exemplars in brain MRI and chest X-rays. Code is available at https://github.com/matt-baugh/many-tasks-make-light-work .
翻译:近年,完全监督机器学习模型无法可靠识别训练集未涵盖的类别,这使得单类建模和分布外检测日益受到关注。实际场景(如筛查、分诊、质量控制)中分布外类别呈长尾分布且数量无限,因此常需训练仅基于严格健康志愿者数据等期望特征分布的单类模型。传统监督机器学习要求收集所有成像模态下所有可能疾病的足够样本数据集,这显然不切实际。当前,结合合成异常的自监督学习方法与生成式自编码器的残差重建误差分析,是最具前景的方案之一。然而现有方法普遍缺乏结构化验证,导致部署校准困难且高度依赖数据集。我们的方法通过利用多个视觉特征各异的合成异常学习任务进行训练和验证,有效缓解了该问题,从而提升训练的鲁棒性和泛化能力。我们以脑部MRI和胸部X光片为例证明:该方法可显著超越现有最优技术。代码开源在 https://github.com/matt-baugh/many-tasks-make-light-work。