Enzymes, with their specific catalyzed reactions, are necessary for all aspects of life, enabling diverse biological processes and adaptations. Predicting enzyme functions is essential for understanding biological pathways, guiding drug development, enhancing bioproduct yields, and facilitating evolutionary studies. Addressing the inherent complexities, we introduce a new approach to annotating enzymes based on their catalyzed reactions. This method provides detailed insights into specific reactions and is adaptable to newly discovered reactions, diverging from traditional classifications by protein family or expert-derived reaction classes. We employ machine learning algorithms to analyze enzyme reaction datasets, delivering a much more refined view on the functionality of enzymes. Our evaluation leverages the largest enzyme-reaction dataset to date, derived from the SwissProt and Rhea databases with entries up to January 8, 2024. We frame the enzyme-reaction prediction as a retrieval problem, aiming to rank enzymes by their catalytic ability for specific reactions. With our model, we can recruit proteins for novel reactions and predict reactions in novel proteins, facilitating enzyme discovery and function annotation (https://github.com/WillHua127/ReactZyme).
翻译:酶及其特异性催化反应是生命各个方面的必需要素,能够实现多样化的生物过程和适应性。预测酶功能对于理解生物通路、指导药物开发、提高生物产物产量以及促进进化研究至关重要。针对固有的复杂性,我们引入了一种基于酶催化反应进行酶注释的新方法。该方法提供了对特定反应的详细洞察,并能适应新发现的反应,从而区别于基于蛋白质家族或专家推导的反应类别的传统分类方法。我们采用机器学习算法分析酶反应数据集,从而对酶功能提供更为精细的视角。我们的评估利用了迄今为止最大的酶反应数据集,该数据集源自截至2024年1月8日的SwissProt和Rhea数据库条目。我们将酶反应预测构建为一个检索问题,旨在根据酶对特定反应的催化能力对其进行排序。利用我们的模型,我们可以为新型反应招募蛋白质,并预测新型蛋白质中的反应,从而促进酶的发现和功能注释(https://github.com/WillHua127/ReactZyme)。