Aggregating pharmaceutical data in the drug-target interaction (DTI) domain has the potential to deliver life-saving breakthroughs. It is, however, notoriously difficult due to regulatory constraints and commercial interests. This work proposes the application of federated learning, which we argue to be reconcilable with the industry's constraints, as it does not require sharing of any information that would reveal the entities' data or any other high-level summary of it. When used on a representative GraphDTA model and the KIBA dataset it achieves up to 15% improved performance relative to the best available non-privacy preserving alternative. Our extensive battery of experiments shows that, unlike in other domains, the non-IID data distribution in the DTI datasets does not deteriorate FL performance. Additionally, we identify a material trade-off between the benefits of adding new data, and the cost of adding more clients.
翻译:在药物-靶标相互作用(DTI)领域,聚合制药数据具有实现生命拯救突破的潜力,但由于监管限制和商业利益,这一任务面临显著挑战。本研究提出应用联邦学习框架,我们认为该框架与行业约束相容,因其无需共享任何可能泄露实体数据或其高层摘要的信息。当将该框架应用于代表性GraphDTA模型及KIBA数据集时,相较于现有最佳非隐私保护方案,性能提升了高达15%。我们开展的大规模实验表明,与其他领域不同,DTI数据集中的非独立同分布数据分布不会损害联邦学习性能。此外,我们识别出新增数据带来的收益与增加客户端数量所产生的成本之间存在实质性权衡。