Surgical Aggregation: A Collaborative Learning Framework for Harmonizing Distributed Medical Imaging Datasets with Diverse Tasks

Large-scale chest x-ray datasets have been curated for the detection of abnormalities using deep learning, with the potential to provide substantial benefits across many clinical applications. However, each dataset focuses only on detecting a subset of findings that can be simultaneously present in a patient, thereby limiting its clinical utility. Therefore, data harmonization is crucial to leverage these datasets in aggregate to train clinically-useful, robust models with a complete representation of all abnormalities that may occur within the thorax. To that end, we propose surgical aggregation, a collaborative learning framework for harmonizing and aggregating knowledge from distributed heterogeneous datasets with partial disease annotations. We evaluate surgical aggregation across synthetic iid datasets and real-world large-scale non-iid datasets with partial annotations. Our results indicate that surgical aggregation significantly outperforms current strategies, has better generalizability, and has the potential to revolutionize the development clinically-useful models as AI-assisted disease characterization becomes a mainstay in radiology.

翻译：大规模胸部X光数据集已被整理用于基于深度学习的异常检测，有望在众多临床应用中带来显著收益。然而，每个数据集仅专注于检测患者体内可能同时存在的部分病变，从而限制了其临床实用性。因此，数据协调对于聚合利用这些数据集至关重要，旨在训练具有完整胸腔异常表征的临床实用且鲁棒的模型。为此，我们提出手术聚合这一协同学习框架，用于协调和整合具有部分疾病标注的分布式异构数据集中的知识。我们在合成的独立同分布数据集和真实世界的大规模非独立同分布部分标注数据集上对手术聚合进行了评估。结果表明，手术聚合在性能上显著优于现有策略，具有更强的泛化能力，并有望随着人工智能辅助疾病诊断成为放射学主流，彻底改变临床实用模型的开发范式。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

ICLR 2021杰出论文奖出炉，8篇论文上榜！

专知会员服务

26+阅读 · 2021年4月2日

【新开放书】医学影像原理与应用，Medical Imaging Principles and Applications

专知会员服务

90+阅读 · 2019年12月15日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日