The secondary structure of ribonucleic acid (RNA) is more stable and accessible in the cell than its tertiary structure, making it essential for functional prediction. Although deep learning has shown promising results in this field, current methods suffer from poor generalization and high complexity. In this work, we reformulate the RNA secondary structure prediction as a K-Rook problem, thereby simplifying the prediction process into probabilistic matching within a finite solution space. Building on this innovative perspective, we introduce RFold, a simple yet effective method that learns to predict the most matching K-Rook solution from the given sequence. RFold employs a bi-dimensional optimization strategy that decomposes the probabilistic matching problem into row-wise and column-wise components to reduce the matching complexity, simplifying the solving process while guaranteeing the validity of the output. Extensive experiments demonstrate that RFold achieves competitive performance and about eight times faster inference efficiency than the state-of-the-art approaches. The code and Colab demo are available in \href{http://github.com/A4Bio/RFold}{http://github.com/A4Bio/RFold}.
翻译:核糖核酸(RNA)的二级结构比其三级结构在细胞内更稳定且更易获取,因此对功能预测至关重要。尽管深度学习在该领域已展现出良好前景,但现有方法普遍存在泛化能力差与复杂度高的问题。本研究将RNA二级结构预测重新表述为K-Rook问题,从而将预测过程简化为有限解空间内的概率匹配问题。基于这一创新视角,我们提出了RFold——一种简单而有效的方法,通过学习从给定序列中预测最匹配的K-Rook解。RFold采用二维优化策略,将概率匹配问题分解为行向与列向分量以降低匹配复杂度,在保证输出有效性的同时简化求解过程。大量实验表明,RFold在取得具有竞争力性能的同时,推理效率较现有最优方法提升约八倍。代码与Colab演示可通过\href{http://github.com/A4Bio/RFold}{http://github.com/A4Bio/RFold}获取。