Motivation: RNA design aims to find at least one sequence that folds with the highest probability into a designated target structure, but some structures are undesignable in the sense that no sequence folds into them. Identifying undesignable structures is useful in delineating and understanding the limit of RNA designability, but has received little attention until recently. In addition, existing methods on undesignability are not scalable and not interpretable. Results: We introduce a novel graph representation and a new general algorithmic framework to efficiently identify undesignable motifs in a secondary structure. The proposed algorithm enumerates minimal motifs based on the loop-pair graph representation of a structure and establishes the undesignability of a motif by proposing rival substructure(s). Our work can also identify unique minimum undesignable motifs across different structures. Our implemented algorithms successfully identify 26 unique minimum undesignable motifs among 18 undesignable puzzles from the benchmark Eterna100. Additionally, our algorithm is so efficient that it scales to natural structures of 16S and 23S Ribosomal RNAs (about 1,500 and 3,000 nucleotides, resp.), and finds all of those structures in the widely used ArchiveII database to be undesignable, with 73 unique minimum undesignable motifs, under the standard Turner energy model in ViennaRNA.
翻译:动机:RNA设计旨在找出至少一个能以最高概率折叠成指定目标结构的序列,但某些结构本质上是不可设计的,即没有序列能折叠成它们。识别不可设计结构对于界定和理解RNA可设计性的极限非常有用,但直到最近才受到关注。此外,现有的不可设计性方法缺乏可扩展性和可解释性。结果:我们引入了一种新颖的图表示方法和一个通用算法框架,以高效识别二级结构中的不可设计基序。所提出的算法基于结构环-配对图表示枚举最小基序,并通过提出竞争性子结构来建立基序的不可设计性。我们的工作还能识别不同结构中的独特最小不可设计基序。实现的算法成功从基准数据集Eterna100的18个不可设计谜题中识别出26个独特最小不可设计基序。此外,算法效率极高,可扩展至16S和23S核糖体RNA(分别约1500和3000个核苷酸)的自然结构,并在广泛使用的ArchiveII数据库中发现所有结构均不可设计,基于ViennaRNA中的标准Turner能量模型,共识别出73个独特最小不可设计基序。