Despite deep learning's success in chemistry, its impact is hindered by a lack of interpretability and an inability to resolve activity cliffs, where minor structural nuances trigger drastic property shifts. Current representation learning, bound by the similarity principle, often fails to capture these structural-activity discontinuities. To address this, we introduce MolEvolve, an evolutionary framework that reformulates molecular discovery as an autonomous, look-ahead planning problem. Unlike traditional methods that depend on human-engineered features or rigid prior knowledge, MolEvolve leverages a Large Language Model (LLM) to actively explore and evolve a library of executable chemical symbolic operations. By utilizing the LLM to cold start and an Monte Carlo Tree Search (MCTS) engine for test-time planning with external tools (e.g. RDKit), the system self-discovers optimal trajectories autonomously. This process evolves transparent reasoning chains that translate complex structural transformations into actionable, human-readable chemical insights. Experimental results demonstrate that MolEvolve's autonomous search not only evolves transparent, human-readable chemical insights, but also outperforms baselines in both property prediction and molecule optimization tasks.
翻译:尽管深度学习在化学领域取得了成功,但其影响力因缺乏可解释性以及无法解决活性悬崖问题(即微小的结构差异导致性质剧烈变化)而受到制约。当前基于相似性原理的表征学习往往无法捕捉这些结构-活性不连续性。为此,我们提出MolEvolve——一个将分子发现重新定义为自主前瞻规划问题的进化框架。与传统依赖人工设计特征或刚性先验知识的方法不同,MolEvolve利用大语言模型主动探索并演化可执行化学符号操作库。通过结合LLM的冷启动能力与蒙特卡洛树搜索引擎在执行外部工具(如RDKit)时的测试时规划能力,系统能自主发现最优轨迹。该过程演化出透明的推理链,将复杂结构变化转化为可操作、人类可读的化学见解。实验结果表明,MolEvolve的自主搜索不仅能演化出透明且人类可读的化学见解,在性质预测与分子优化任务上也均优于基线方法。