Aggregating pharmaceutical data in the drug-target interaction (DTI) domain has the potential to deliver life-saving breakthroughs. It is, however, notoriously difficult due to regulatory constraints and commercial interests. This work proposes the application of federated learning, which we argue to be reconcilable with the industry's constraints, as it does not require sharing of any information that would reveal the entities' data or any other high-level summary of it. When used on a representative GraphDTA model and the KIBA dataset it achieves up to 15% improved performance relative to the best available non-privacy preserving alternative. Our extensive battery of experiments shows that, unlike in other domains, the non-IID data distribution in the DTI datasets does not deteriorate FL performance. Additionally, we identify a material trade-off between the benefits of adding new data, and the cost of adding more clients.
翻译:药物-靶点相互作用(DTI)领域的数据聚合具有实现挽救生命突破的潜力。然而,由于监管约束和商业利益,这一过程极其困难。本文提出联邦学习的应用,我们认为该方案与行业约束相兼容,因为它无需共享任何会暴露实体数据或其高级汇总信息的内容。当应用于代表性GraphDTA模型和KIBA数据集时,与现有的最佳非隐私保护方案相比,其性能提升高达15%。我们的大量实验表明,与其他领域不同,DTI数据集中非同分布数据并不会降低联邦学习的性能。此外,我们发现在新增数据的收益与新增客户端的成本之间存在实质性权衡。