Motivation: RNA design aims to find RNA sequences that fold into a given target secondary structure, a problem also known as RNA inverse folding. However, not all target structures are designable. Recent advances in RNA designability have focused primarily on minimum free energy (MFE)-based criteria, while ensemble-based notions of designability remain largely underexplored. To address this gap, we introduce a theory of ensemble approximation and a probability decomposition framework for bounding the folding probabilities of RNA structures in an explainable way. We further develop a linear-time dynamic programming algorithm that efficiently searches over exponentially many decompositions and identifies the optimal one that yields the tightest probabilistic bound for a given structure. Results: Applying our methods to both native and artificial RNA structures in the ArchiveII and Eterna100 benchmarks, we obtained probability bounds that are much tighter than prior approaches. In addition, our methods further provide anatomical tools for analyzing RNA structures and understanding the sources of design difficulty at the motif level. Availability: Source code and data are available at https://github.com/shanry/RNA-Undesign. Supplementary information: Supplementary text and data are available in a separate PDF.
翻译:动机:RNA设计旨在寻找能够折叠成给定目标二级结构的RNA序列,这一问题亦称为RNA逆折叠。然而,并非所有目标结构都是可设计的。近期关于RNA可设计性的研究主要集中于基于最小自由能(MFE)的判据,而基于集成概念的可设计性定义在很大程度上仍未得到充分探索。为填补这一空白,我们提出了一种集成逼近理论以及一个概率分解框架,用于以可解释的方式界定RNA结构的折叠概率。我们进一步开发了一种线性时间动态规划算法,该算法能高效搜索指数级数量的分解方案,并识别出能为给定结构产生最紧概率界的最优分解。结果:将我们的方法应用于ArchiveII和Eterna100基准测试中的天然及人工RNA结构,我们获得的概率界远优于先前方法。此外,我们的方法还提供了用于分析RNA结构、在模体层面理解设计困难来源的解剖学工具。可用性:源代码与数据可在https://github.com/shanry/RNA-Undesign获取。补充信息:补充文本与数据以独立PDF形式提供。