Large Language Models demonstrate substantial promise for advancing scientific discovery, yet their deployment in disciplines demanding factual precision and specialized domain constraints presents significant challenges. Within molecular design for pharmaceutical development, these models can propose innovative molecular modifications but frequently generate chemically infeasible structures. We introduce VALID-Mol, a comprehensive framework that integrates chemical validation with LLM-driven molecular design, achieving an improvement in valid chemical structure generation from 3% to 83%. Our methodology synthesizes systematic prompt optimization, automated chemical verification, and domain-adapted fine-tuning to ensure dependable generation of synthesizable molecules with enhanced properties. Our contribution extends beyond implementation details to provide a transferable methodology for scientifically-constrained LLM applications with measurable reliability enhancements. Computational analyses indicate our framework generates promising synthesis candidates with up to 17-fold predicted improvements in target binding affinity while preserving synthetic feasibility.
翻译:大型语言模型在推动科学发现方面展现出巨大潜力,然而在需要事实精确性和特定领域约束的学科中部署这些模型仍面临重大挑战。在药物开发的分子设计领域,这些模型能够提出创新的分子修饰方案,但常常生成化学上不可行的结构。我们提出了VALID-Mol,这是一个将化学验证与LLM驱动的分子设计相结合的综合框架,将有效化学结构的生成率从3%提升至83%。我们的方法综合了系统提示优化、自动化化学验证和领域适应微调,以确保可靠地生成具有增强性质的可合成分子。我们的贡献超越了具体实现细节,为科学约束下的LLM应用提供了一种可迁移的方法论,并实现了可测量的可靠性提升。计算分析表明,我们的框架能够生成有前景的合成候选分子,在保持合成可行性的同时,其预测靶标结合亲和力最高可提升17倍。