As the primary mRNA delivery vehicles, ionizable lipid nanoparticles (LNPs) exhibit excellent safety, high transfection efficiency, and strong immune response induction. However, the screening process for LNPs is time-consuming and costly. To expedite the identification of high-transfection-efficiency mRNA drug delivery systems, we propose an explainable LNPs transfection efficiency prediction model, called TransMA. TransMA employs a multi-modal molecular structure fusion architecture, wherein the fine-grained atomic spatial relationship extractor named molecule 3D Transformer captures three-dimensional spatial features of the molecule, and the coarse-grained atomic sequence extractor named molecule Mamba captures one-dimensional molecular features. We design the mol-attention mechanism block, enabling it to align coarse and fine-grained atomic features and captures relationships between atomic spatial and sequential structures. TransMA achieves state-of-the-art performance in predicting transfection efficiency using the scaffold and cliff data splitting methods on the current largest LNPs dataset, including Hela and RAW cell lines. Moreover, we find that TransMA captures the relationship between subtle structural changes and significant transfection efficiency variations, providing valuable insights for LNPs design. Additionally, TransMA's predictions on external transfection efficiency data maintain a consistent order with actual transfection efficiencies, demonstrating its robust generalization capability. The code, model and data are made publicly available at https://github.com/wklix/TransMA/tree/master. We hope that high-accuracy transfection prediction models in the future can aid in LNPs design and initial screening, thereby assisting in accelerating the mRNA design process.
翻译:作为主要的mRNA递送载体,可电离脂质纳米颗粒(LNPs)展现出优异的安全性、高转染效率和强大的免疫应答诱导能力。然而,LNPs的筛选过程耗时且成本高昂。为加速高效转染mRNA药物递送系统的识别,我们提出了一种可解释的LNPs转染效率预测模型,称为TransMA。TransMA采用多模态分子结构融合架构,其中名为分子3D Transformer的细粒度原子空间关系提取器捕获分子的三维空间特征,而名为分子Mamba的粗粒度原子序列提取器捕获一维分子特征。我们设计了分子注意力机制模块,使其能够对齐粗粒度和细粒度原子特征,并捕获原子空间结构与序列结构之间的关系。在当前最大的LNPs数据集(包括Hela和RAW细胞系)上,通过骨架和陡变数据划分方法,TransMA在预测转染效率方面实现了最先进的性能。此外,我们发现TransMA能够捕捉细微结构变化与显著转染效率差异之间的关系,为LNPs设计提供了有价值的见解。同时,TransMA在外部转染效率数据上的预测结果与实际转染效率保持一致的排序,证明了其强大的泛化能力。代码、模型和数据已在https://github.com/wklix/TransMA/tree/master 公开提供。我们希望未来高精度的转染预测模型能够辅助LNPs设计和初步筛选,从而助力加速mRNA设计进程。