Aggregating pharmaceutical data in the drug-target interaction (DTI) domain has the potential to deliver life-saving breakthroughs. It is, however, notoriously difficult due to regulatory constraints and commercial interests. This work proposes the application of federated learning, which we argue to be reconcilable with the industry's constraints, as it does not require sharing of any information that would reveal the entities' data or any other high-level summary of it. When used on a representative GraphDTA model and the KIBA dataset it achieves up to 15% improved performance relative to the best available non-privacy preserving alternative. Our extensive battery of experiments shows that, unlike in other domains, the non-IID data distribution in the DTI datasets does not deteriorate FL performance. Additionally, we identify a material trade-off between the benefits of adding new data, and the cost of adding more clients.
翻译:在药物-靶点相互作用(DTI)领域,聚合制药数据有可能带来拯救生命的突破性进展。然而,由于监管限制和商业利益,这一过程极其困难。本研究提出应用联邦学习方法,我们论证该方法能够与行业约束相协调,因为它无需共享任何会暴露实体数据或其高层摘要的信息。当应用于代表性模型GraphDTA和KIBA数据集时,该方法相对于现有最佳非隐私保护方案实现了高达15%的性能提升。我们的大规模实验表明,与其他领域不同,DTI数据集中的非独立同分布数据分布并不会降低联邦学习性能。此外,我们发现在增加新数据的收益与增加更多客户端的成本之间存在实质性权衡。