Following the milestones in large language models (LLMs) and multimodal models, we have seen a surge in applying LLMs to biochemical tasks. Leveraging graph features and molecular text representations, LLMs can tackle various tasks, such as predicting chemical reaction outcomes and describing molecular properties. However, most current work overlooks the multi-level nature of graph features. The impact of different feature levels on LLMs and the importance of each level remain unexplored, and it is possible that different chemistry tasks require different feature levels. In this work, we first investigate the effect of feature granularity by fusing GNN-generated feature tokens, discovering that even reducing all tokens to a single token does not significantly impact performance. We then explore the effect of various feature levels on performance, finding that both the quality of LLM-generated molecules and performance on different tasks benefit from different feature levels. We conclude with two key insights: (1) current molecular Multimodal LLMs(MLLMs) lack a comprehensive understanding of graph features, and (2) static processing is not sufficient for hierarchical graph feature. Our code will be publicly available soon.
翻译:随着大语言模型(LLMs)和多模态模型的里程碑式进展,我们观察到LLMs在生物化学任务中的应用激增。通过利用图特征和分子文本表示,LLMs能够处理多种任务,例如预测化学反应结果和描述分子性质。然而,当前大多数研究忽视了图特征的多层次本质。不同特征层次对LLMs的影响以及各层次的重要性尚未得到探索,且不同的化学任务可能需要不同的特征层次。在本工作中,我们首先通过融合GNN生成的特征标记来研究特征粒度的影响,发现即使将所有标记缩减为单个标记也不会显著影响性能。随后,我们探究了不同特征层次对性能的影响,发现LLM生成的分子质量以及在不同任务上的表现均受益于不同的特征层次。我们最终得出两个关键结论:(1)当前的分子多模态大语言模型(MLLMs)缺乏对图特征的全面理解;(2)静态处理不足以应对分层图特征。我们的代码将很快公开。