Aggregating pharmaceutical data in the drug-target interaction (DTI) domain has the potential to deliver life-saving breakthroughs. It is, however, notoriously difficult due to regulatory constraints and commercial interests. This work proposes the application of federated learning, which we argue to be reconcilable with the industry's constraints, as it does not require sharing of any information that would reveal the entities' data or any other high-level summary of it. When used on a representative GraphDTA model and the KIBA dataset it achieves up to 15% improved performance relative to the best available non-privacy preserving alternative. Our extensive battery of experiments shows that, unlike in other domains, the non-IID data distribution in the DTI datasets does not deteriorate FL performance. Additionally, we identify a material trade-off between the benefits of adding new data, and the cost of adding more clients.
翻译:在药物-靶点相互作用(DTI)领域整合制药数据有望带来挽救生命的突破性进展。然而,由于监管限制和商业利益,这一过程极为困难。本文提出应用联邦学习,我们认为该方法能够与行业约束兼容——因为它无需共享任何可能泄露实体数据或其高层次摘要的信息。当应用于代表性GraphDTA模型和KIBA数据集时,该方法相较于最佳可用非隐私保护方案可实现高达15%的性能提升。通过大量实验表明,与其他领域不同,DTI数据集中非独立同分布的数据分布并不会降低联邦学习性能。此外,我们识别出新增数据带来的收益与客户端数量增加的成本之间存在实质性权衡关系。