Machine learning, notably deep learning, has significantly propelled molecular investigations within the biochemical sphere. Traditionally, modeling for such research has centered around a handful of paradigms. For instance, the prediction paradigm is frequently deployed for tasks such as molecular property prediction. To enhance the generation and decipherability of purely data-driven models, scholars have integrated biochemical domain knowledge into these molecular study models. This integration has sparked a surge in paradigm transfer, which is solving one molecular learning task by reformulating it as another one. With the emergence of Large Language Models, these paradigms have demonstrated an escalating trend towards harmonized unification. In this work, we delineate a literature survey focused on knowledge-informed molecular learning from the perspective of paradigm transfer. We classify the paradigms, scrutinize their methodologies, and dissect the contribution of domain knowledge. Moreover, we encapsulate prevailing trends and identify intriguing avenues for future exploration in molecular learning.
翻译:机器学习,尤其是深度学习,已显著推动了生物化学领域中的分子研究。传统上,此类研究的建模工作围绕若干种范式展开。例如,预测范式常被用于分子性质预测等任务。为提升纯数据驱动模型的可生成性和可解读性,学者们将生物化学领域知识融入这些分子研究模型之中。这一整合引发了范式迁移的热潮,即通过将一种分子学习任务重构为另一种任务来加以解决。随着大语言模型的出现,这些范式呈现出日趋和谐统一的趋势。本文从范式迁移的视角,对知识引导的分子学习进行了文献综述。我们对各种范式进行了分类,审视了其方法论,并剖析了领域知识的贡献。此外,我们还总结了当前的发展趋势,并指出了分子学习领域未来值得探索的研究方向。