In past work (Onokpasa, Wild, Wong, DCC 2023), we showed that (a) for joint compression of RNA sequence and structure, stochastic context-free grammars are the best known compressors and (b) that grammars which have better compression ability also show better performance in ab initio structure prediction. Previous grammars were manually curated by human experts. In this work, we develop a framework for automatic and systematic search algorithms for stochastic grammars with better compression (and prediction) ability for RNA. We perform an exhaustive search of small grammars and identify grammars that surpass the performance of human-expert grammars.
翻译:在先前的工作中(Onokpasa, Wild, Wong, DCC 2023),我们证明了:(a)对于RNA序列与结构的联合压缩,随机上下文无关文法是最优的已知压缩器;(b)具有更强压缩能力的文法在从头结构预测中也展现出更优的性能。以往的语法均由人类专家手工设计。在本项工作中,我们开发了一个自动化、系统化的搜索框架,用于寻找具有更优RNA压缩(与预测)能力的随机文法。通过对小型文法进行穷举搜索,我们识别出了性能超越人类专家设计的文法。