Federated learning leverages data across institutions to improve clinical discovery while complying with data-sharing restrictions and protecting patient privacy. This paper provides a gentle introduction to this approach in bioinformatics, and is the first to review key applications in proteomics, genome-wide association studies (GWAS), single-cell and multi-omics studies in their legal as well as methodological and infrastructural challenges. As the evolution of biobanks in genetics and systems biology has proved, accessing more extensive and varied data pools leads to a faster and more robust exploration and translation of results. More widespread use of federated learning may have a similar impact in bioinformatics, allowing academic and clinical institutions to access many combinations of genotypic, phenotypic and environmental information that are undercovered or not included in existing biobanks.
翻译:联邦学习通过跨机构数据协作提升临床发现能力,同时遵守数据共享限制并保护患者隐私。本文以通俗方式介绍该方法在生物信息学中的应用,并首次系统综述其在蛋白质组学、全基因组关联研究(GWAS)、单细胞及多组学研究中的关键应用场景,同时深入探讨其面临的法律、方法论与基础设施挑战。正如遗传学与系统生物学中生物样本库的发展历程所证明的,获取更广泛多元的数据池能够加速研究进程并增强成果转化的稳健性。联邦学习的更广泛应用有望在生物信息学领域产生类似影响,使学术与临床机构能够获取现有生物样本库中覆盖不足或尚未包含的基因型、表型及环境信息的多元组合。