Recent advances in deep learning have accelerated its use in various applications, such as cellular image analysis and molecular discovery. In molecular discovery, a generative adversarial network (GAN), which comprises a discriminator to distinguish generated molecules from existing molecules and a generator to generate new molecules, is one of the premier technologies due to its ability to learn from a large molecular data set efficiently and generate novel molecules that preserve similar properties. However, different pharmaceutical companies may be unwilling or unable to share their local data sets due to the geo-distributed and sensitive nature of molecular data sets, making it impossible to train GANs in a centralized manner. In this paper, we propose a Graph convolutional network in Generative Adversarial Networks via Federated learning (GraphGANFed) framework, which integrates graph convolutional neural Network (GCN), GAN, and federated learning (FL) as a whole system to generate novel molecules without sharing local data sets. In GraphGANFed, the discriminator is implemented as a GCN to better capture features from molecules represented as molecular graphs, and FL is used to train both the discriminator and generator in a distributive manner to preserve data privacy. Extensive simulations are conducted based on the three bench-mark data sets to demonstrate the feasibility and effectiveness of GraphGANFed. The molecules generated by GraphGANFed can achieve high novelty (=100) and diversity (> 0.9). The simulation results also indicate that 1) a lower complexity discriminator model can better avoid mode collapse for a smaller data set, 2) there is a tradeoff among different evaluation metrics, and 3) having the right dropout ratio of the generator and discriminator can avoid mode collapse.
翻译:深度学习的最新进展加速了其在细胞图像分析和分子发现等多个应用中的使用。在分子发现领域,生成对抗网络(GAN)作为一项前沿技术,由区分生成分子与现有分子的判别器和生成新分子的生成器组成,因其能高效学习大型分子数据集并生成保留相似性质的新颖分子而备受关注。然而,由于分子数据集的分布式地理特性和敏感性,不同制药公司可能不愿或无法共享其本地数据集,这使得无法以集中式方式训练GAN。本文提出一种基于联邦学习的生成对抗网络中的图卷积网络(GraphGANFed)框架,该框架将图卷积神经网络(GCN)、GAN和联邦学习(FL)集成为一个整体系统,从而在无需共享本地数据集的情况下生成新颖分子。在GraphGANFed中,判别器被实现为GCN,以更好地从以分子图表示的分子中捕获特征;FL被用于以分布式方式训练判别器和生成器,以保护数据隐私。基于三个基准数据集进行了大量仿真实验,以证明GraphGANFed的可行性和有效性。GraphGANFed生成的分子可实现高新颖性(=100)和多样性(>0.9)。仿真结果还表明:1)较低复杂度的判别器模型能更好地避免较小数据集的模式崩溃;2)不同评估指标之间存在权衡;3)适当的生成器和判别器丢弃率可以避免模式崩溃。