Molecular design encompasses tasks ranging from de-novo design to structural alteration of given molecules or fragments. For the latter, state-of-the-art methods predominantly function as "Instance Optimizers'', expending significant compute restarting the search for every input structure. While model-based approaches theoretically offer amortized efficiency by learning a policy transferable to unseen structures, existing methods struggle to generalize. We identify a key failure mode: the high variance arising from the heterogeneous difficulty of distinct starting structures. To address this, we introduce GRXForm, adapting a pre-trained Graph Transformer model that optimizes molecules via sequential atom-and-bond additions. We employ Group Relative Policy Optimization (GRPO) for goal-directed fine-tuning to mitigate variance by normalizing rewards relative to the starting structure. Empirically, GRXForm generalizes to out-of-distribution molecular scaffolds without inference-time oracle calls or refinement, achieving scores in multi-objective optimization competitive with leading instance optimizers.
翻译:分子设计涵盖从全新设计到给定分子或片段的构型修饰等一系列任务。针对后者,当前最先进的方法主要作为"实例优化器"运行,需为每个输入结构重新启动搜索过程而消耗大量计算资源。虽然基于模型的方法理论上可通过学习可迁移至未见结构的策略来实现摊销效率,但现有方法难以实现有效泛化。我们识别出一个关键失效模式:源于不同起始结构异质性难度所导致的高方差问题。为解决此问题,我们提出GRXForm方法,该方法通过适配预训练的图Transformer模型,以顺序添加原子和化学键的方式优化分子。我们采用面向目标的群体相对策略优化(GRPO)进行微调,通过相对于起始结构标准化奖励来降低方差。实验表明,GRXForm能够在不依赖推理阶段调用预测模型或进行精修的情况下,泛化至分布外分子骨架,并在多目标优化任务中取得与领先实例优化器相竞争的性能评分。