The relationship between RNA structure and function has recently attracted interest within the deep learning community, a trend expected to intensify as nucleic acid structure models advance. Despite this momentum, the lack of standardized, accessible benchmarks for applying deep learning to RNA 3D structures hinders progress. To this end, we introduce a collection of seven benchmarking datasets specifically designed to support RNA structure-function prediction. Built on top of the established Python package rnaglib, our library streamlines data distribution and encoding, provides tools for dataset splitting and evaluation, and offers a comprehensive, user-friendly environment for model comparison. The modular and reproducible design of our datasets encourages community contributions and enables rapid customization. To demonstrate the utility of our benchmarks, we report baseline results for all tasks using a relational graph neural network.
翻译:RNA结构与功能的关系近来引起了深度学习社区的关注,随着核酸结构模型的进步,这一趋势预计将进一步加强。尽管势头良好,但缺乏标准化、易于访问的基准来支持深度学习在RNA三维结构上的应用,阻碍了该领域的发展。为此,我们引入了一套包含七个基准测试数据集的集合,专门用于支持RNA结构-功能预测。这些数据集构建于成熟的Python包rnaglib之上,我们的库简化了数据分发与编码流程,提供了数据集划分与评估工具,并创建了一个全面、用户友好的模型比较环境。数据集的模块化与可复现设计鼓励社区贡献,并支持快速定制。为展示本基准的实用性,我们使用关系图神经网络报告了所有任务的基线结果。