GlycanML: A Multi-Task and Multi-Structure Benchmark for Glycan Machine Learning

Glycans are basic biomolecules and perform essential functions within living organisms. The rapid increase of functional glycan data provides a good opportunity for machine learning solutions to glycan understanding. However, there still lacks a standard machine learning benchmark for glycan property and function prediction. In this work, we fill this blank by building a comprehensive benchmark for Glycan Machine Learning (GlycanML). The GlycanML benchmark consists of diverse types of tasks including glycan taxonomy prediction, glycan immunogenicity prediction, glycosylation type prediction, and protein-glycan interaction prediction. Glycans can be represented by both sequences and graphs in GlycanML, which enables us to extensively evaluate sequence-based models and graph neural networks (GNNs) on benchmark tasks. Furthermore, by concurrently performing eight glycan taxonomy prediction tasks, we introduce the GlycanML-MTL testbed for multi-task learning (MTL) algorithms. Also, we evaluate how taxonomy prediction can boost other three function prediction tasks by MTL. Experimental results show the superiority of modeling glycans with multi-relational GNNs, and suitable MTL methods can further boost model performance. We provide all datasets and source codes at https://github.com/GlycanML/GlycanML and maintain a leaderboard at https://GlycanML.github.io/project

翻译：聚糖是基础生物分子，在生物体内发挥着关键功能。功能性聚糖数据的快速增长为利用机器学习方法理解聚糖提供了良好机遇。然而，目前仍缺乏用于聚糖性质与功能预测的标准机器学习基准。本研究通过构建全面的聚糖机器学习（GlycanML）基准填补了这一空白。GlycanML基准包含多种类型的任务，包括聚糖分类学预测、聚糖免疫原性预测、糖基化类型预测以及蛋白质-聚糖相互作用预测。在GlycanML中，聚糖可通过序列和图两种形式表示，这使我们能够在基准任务上广泛评估基于序列的模型和图神经网络（GNNs）。此外，通过并行执行八项聚糖分类学预测任务，我们构建了用于多任务学习（MTL）算法的GlycanML-MTL测试平台。同时，我们评估了通过MTL方法如何利用分类学预测提升其他三项功能预测任务的性能。实验结果表明，采用多关系图神经网络建模聚糖具有优越性，而合适的MTL方法能进一步提升模型性能。我们在https://github.com/GlycanML/GlycanML 提供全部数据集与源代码，并在https://GlycanML.github.io/project 维护性能排行榜。

相关内容

Machine Learning

关注 2251

机器学习（Machine Learning）是一个研究计算学习方法的国际论坛。该杂志发表文章，报告广泛的学习方法应用于各种学习问题的实质性结果。该杂志的特色论文描述研究的问题和方法，应用研究和研究方法的问题。有关学习问题或方法的论文通过实证研究、理论分析或与心理现象的比较提供了坚实的支持。应用论文展示了如何应用学习方法来解决重要的应用问题。研究方法论文改进了机器学习的研究方法。所有的论文都以其他研究人员可以验证或复制的方式描述了支持证据。论文还详细说明了学习的组成部分，并讨论了关于知识表示和性能任务的假设。官网地址：http://dblp.uni-trier.de/db/journals/ml/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日