Deep learning is now widely used in drug discovery, providing significant acceleration and cost reduction. As the most fundamental building block, molecular representation is essential for predicting molecular properties to enable various downstream applications. Most existing methods attempt to incorporate more information to learn better representations. However, not all features are equally important for a specific task. Ignoring this would potentially compromise the training efficiency and predictive accuracy. To address this issue, we propose a novel approach, which treats language models as an agent and molecular pretraining models as a knowledge base. The agent accentuates task-relevant features in the molecular representation by understanding the natural language description of the task, just as a tailor customizes clothes for clients. Thus, we call this approach MolTailor. Evaluations demonstrate MolTailor's superior performance over baselines, validating the efficacy of enhancing relevance for molecular representation learning. This illustrates the potential of language model guided optimization to better exploit and unleash the capabilities of existing powerful molecular representation methods. Our codes and appendix are available at https://github.com/SCIR-HI/MolTailor.
翻译:深度学习现已被广泛应用于药物发现领域,显著提升了研发速度并降低了成本。作为最基础的构建模块,分子表征对于预测分子性质以实现各种下游应用至关重要。现有方法大多尝试纳入更多信息以学习更优的表征。然而,并非所有特征对特定任务都具有同等重要性。忽视这一点可能会影响训练效率与预测精度。为解决该问题,我们提出一种新方法,将语言模型视为代理(molecular pretraining models)视为知识库。该代理通过理解任务的自然语言描述,突出分子表征中与任务相关的特征,如同裁缝为客户定制衣物。因此,我们将该方法命名为MolTailor。评估结果表明,MolTailor在性能上优于基线方法,验证了增强分子表征学习相关性的有效性。这展示了语言模型引导优化在更好挖掘并释放现有强大分子表征方法潜力方面的潜力。我们的代码与附录详见https://github.com/SCIR-HI/MolTailor。