Natural products are substances produced by organisms in nature and often possess biological activity and structural diversity. Drug development based on natural products has been common for many years. However, the intricate structures of these compounds present challenges in terms of structure determination and synthesis, particularly compared to the efficiency of high-throughput screening of synthetic compounds. In recent years, deep learning-based methods have been applied to the generation of molecules. In this study, we trained chemical language models on a natural product dataset and generated natural product-like compounds. The results showed that the distribution of the compounds generated was similar to that of natural products. We also evaluated the effectiveness of the generated compounds as drug candidates. Our method can be used to explore the vast chemical space and reduce the time and cost of drug discovery of natural products.
翻译:天然产物是自然界中生物产生的物质,通常具有生物活性和结构多样性。基于天然产物的药物开发多年来已十分普遍。然而,这些化合物的复杂结构在结构确定与合成方面带来了挑战,尤其相较于合成化合物的高通量筛选效率而言。近年来,基于深度学习的方法已被应用于分子生成。在本研究中,我们在天然产物数据集上训练了化学语言模型,并生成了类天然产物化合物。结果表明,所生成化合物的分布与天然产物相似。我们还评估了生成化合物作为候选药物的有效性。我们的方法可用于探索广阔的化学空间,并减少天然产物药物发现的时间与成本。