Deep learning is now widely used in drug discovery, providing significant acceleration and cost reduction. As the most fundamental building block, molecular representation is essential for predicting molecular properties to enable various downstream applications. Most existing methods attempt to incorporate more information to learn better representations. However, not all features are equally important for a specific task. Ignoring this would potentially compromise the training efficiency and predictive accuracy. To address this issue, we propose a novel approach, which treats language models as an agent and molecular pretraining models as a knowledge base. The agent accentuates task-relevant features in the molecular representation by understanding the natural language description of the task, just as a tailor customizes clothes for clients. Thus, we call this approach MolTailor. Evaluations demonstrate MolTailor's superior performance over baselines, validating the efficacy of enhancing relevance for molecular representation learning. This illustrates the potential of language model guided optimization to better exploit and unleash the capabilities of existing powerful molecular representation methods. Our code is available at https://github.com/SCIR-HI/MolTailor.
翻译:深度学习现广泛应用于药物研发,显著加速研发进程并降低成本。作为最基础的构建单元,分子表示对预测分子性质以实现各类下游应用至关重要。现有方法大多尝试整合更多信息以学习更优表示,但特定任务中并非所有特征同等重要。忽视这一点可能影响训练效率与预测精度。为解决该问题,我们提出一种新方法:将语言模型视为代理,分子预训练模型视为知识库。该代理通过理解任务的自然语言描述,强调分子表示中与任务相关的特征——正如裁缝为客户定制衣物。因此我们将该方法命名为MolTailor。评估表明,MolTailor在性能上全面超越基线方法,验证了增强分子表示学习相关性的有效性。这展示了语言模型引导优化在更好挖掘并释放现有强大分子表示方法潜力方面的可能性。我们的代码已开源至 https://github.com/SCIR-HI/MolTailor。