Molecule property prediction has gained significant attention in recent years. The main bottleneck is the label insufficiency caused by expensive lab experiments. In order to alleviate this issue and to better leverage textual knowledge for tasks, this study investigates the feasibility of employing natural language instructions to accomplish molecule-related tasks in a zero-shot setting. We discover that existing molecule-text models perform poorly in this setting due to inadequate treatment of instructions and limited capacity for graphs. To overcome these issues, we propose GIMLET, which unifies language models for both graph and text data. By adopting generalized position embedding, our model is extended to encode both graph structures and instruction text without additional graph encoding modules. GIMLET also decouples encoding of the graph from tasks instructions in the attention mechanism, enhancing the generalization of graph features across novel tasks. We construct a dataset consisting of more than two thousand molecule tasks with corresponding instructions derived from task descriptions. We pretrain GIMLET on the molecule tasks along with instructions, enabling the model to transfer effectively to a broad range of tasks. Experimental results demonstrate that GIMLET significantly outperforms molecule-text baselines in instruction-based zero-shot learning, even achieving closed results to supervised GNN models on tasks such as toxcast and muv.
翻译:分子性质预测近年来受到广泛关注。其主要瓶颈在于昂贵的实验室实验导致的标签不足。为缓解这一问题并更好地利用任务中的文本知识,本研究探讨了在零样本设置下使用自然语言指令完成分子相关任务的可行性。我们发现,现有的分子-文本模型由于对指令处理不足以及图数据处理能力有限,在此设置下表现不佳。为解决这些问题,我们提出GIMLET,该模型统一了图数据和文本数据的语言模型。通过采用广义位置嵌入,我们的模型能够在不增加额外图编码模块的情况下,同时编码图结构和指令文本。GIMLET还在注意力机制中将图的编码与任务指令解耦,增强了图特征在新任务上的泛化能力。我们构建了一个包含两千多个分子任务的数据集,每个任务对应由任务描述导出的指令。我们在这些分子任务及对应的指令上对GIMLET进行预训练,使模型能够有效迁移到广泛的任务中。实验结果表明,在基于指令的零样本学习中,GIMLET显著优于分子-文本基线模型,甚至在toxcast和muv等任务上取得了与有监督的GNN模型相当的结果。