The RNA structure-function relationship has recently garnered significant attention within the deep learning community, promising to grow in importance as nucleic acid structure models advance. However, the absence of standardized and accessible benchmarks for deep learning on RNA 3D structures has impeded the development of models for RNA functional characteristics. In this work, we introduce a set of seven benchmarking datasets for RNA structure-function prediction, designed to address this gap. Our library builds on the established Python library rnaglib, and offers easy data distribution and encoding, splitters and evaluation methods, providing a convenient all-in-one framework for comparing models. Datasets are implemented in a fully modular and reproducible manner, facilitating for community contributions and customization. Finally, we provide initial baseline results for all tasks using a graph neural network. Source code: https://github.com/cgoliver/rnaglib Documentation: https://rnaglib.org
翻译:RNA结构与功能关系近来在深度学习领域受到广泛关注,随着核酸结构模型的进步,其重要性有望持续提升。然而,由于缺乏标准化且易于获取的RNA三维结构深度学习基准,针对RNA功能特性的模型开发一直受到制约。本研究针对这一空白,提出了一套包含七个基准数据集用于RNA结构-功能预测。我们的工具库基于成熟的Python库rnaglib构建,提供便捷的数据分发与编码、数据分割器及评估方法,为模型比较提供了一体化框架。所有数据集均以完全模块化且可复现的方式实现,便于社区贡献和定制化修改。最后,我们使用图神经网络为所有任务提供了初步基准结果。源代码:https://github.com/cgoliver/rnaglib 文档:https://rnaglib.org