Path Signatures Enable Model-Free Mapping of RNA Modifications

Maud Lemercier,Paola Arrubarrena,Salvatore Di Giorgio,Julia Brettschneider,Thomas Cass,Valerie Griesche,Isabel S. Naarmann-de Vries,Anastasia Papavasiliou,Alessia Ruggieri,Irem Tellioglu,Chia Ching Wu,F. Nina Papavasiliou,Terry Lyons

Detecting chemical modifications on RNA molecules remains a key challenge in epitranscriptomics. Traditional reverse transcription-based sequencing methods introduce enzyme- and sequence-dependent biases and fragment RNA molecules, confounding the accurate mapping of modifications across the transcriptome. Nanopore direct RNA sequencing offers a powerful alternative by preserving native RNA molecules, enabling the detection of modifications at single-molecule resolution. However, current computational tools can identify only a limited subset of modification types within well-characterized sequence contexts for which ample training data exists. Here, we introduce a model-free computational method that reframes modification detection as an anomaly detection problem, requiring only canonical (unmodified) RNA reads without any other annotated data. For each nanopore read, our approach extracts robust, modification-sensitive features from the raw ionic current signal at a site using the signature transform, then computes an anomaly score by comparing the resulting feature vector to its nearest neighbors in an unmodified reference dataset. We convert anomaly scores into statistical p-values to enable anomaly detection at both individual read and site levels. Validation on densely-modified \textit{E. coli} rRNA demonstrates that our approach detects known sites harboring diverse modification types, without prior training on these modifications. We further applyied this framework to dengue virus (DENV) transcripts and mammalian mRNAs. For DENV sfRNA, it led to revealing a novel 2'-O-methylated site, which we validate orthogonally by qRT-PCR assays. These results demonstrate that our model-free approach operates robustly across different types of RNAs and datasets generated with different nanopore sequencing chemistries.

翻译：检测RNA分子的化学修饰仍然是表观转录组学中的一个关键挑战。传统的基于逆转录的测序方法会引入酶依赖性和序列依赖性的偏差，并导致RNA分子断裂，从而干扰整个转录组中修饰的准确定位。纳米孔直接RNA测序通过保留天然RNA分子，提供了一种强大的替代方案，能够以单分子分辨率检测修饰。然而，当前的计算工具只能在已有充足训练数据的、特征明确的序列上下文中识别有限的修饰类型子集。在此，我们提出一种无模型计算方法，将修饰检测重新定义为异常检测问题，仅需规范（未修饰）的RNA读段，无需任何其他注释数据。对于每个纳米孔读段，我们的方法使用签名变换从原始离子电流信号中提取稳健的、对修饰敏感的特征，然后通过将生成的特征向量与未修饰参考数据集中的最近邻进行比较来计算异常分数。我们将异常分数转换为统计p值，从而实现在单个读段和位点两个层面进行异常检测。在密集修饰的\textit{大肠杆菌} rRNA上的验证表明，我们的方法能够检测出含有多种修饰类型的已知位点，而无需事先对这些修饰进行训练。我们进一步将该框架应用于登革病毒（DENV）转录本和哺乳动物mRNA。对于DENV sfRNA，该方法揭示了一个新的2'-O-甲基化位点，我们通过qRT-PCR实验进行了正交验证。这些结果表明，我们的无模型方法能够在不同类型的RNA以及使用不同纳米孔测序化学方法生成的数据集上稳健运行。