Molecular structure elucidation from spectroscopic data is a long-standing challenge in Chemistry, traditionally requiring expert interpretation. We introduce NMIRacle, a two-stage generative framework that builds upon recent paradigms in AI-driven spectroscopy with minimal assumptions. In the first stage, NMIRacle learns to reconstruct molecular structures from count-aware fragment representations, capturing both fragment identities and their occurrences. In the second stage, a spectral encoder maps input spectra (IR, 1H-NMR, 13C-NMR) into a latent embedding used to condition the pre-trained generator, which is fine-tuned for direct spectra-to-molecule generation. This formulation bridges fragment-level chemical modeling with spectral evidence, yielding accurate molecular predictions. Empirical results demonstrate that NMIRacle outperforms existing baselines on molecular elucidation, while maintaining robust performance across increasing levels of molecular complexity.
翻译:从光谱数据解析分子结构是化学领域长期存在的挑战,传统上需要专家进行人工解读。本文提出NMIRacle——一个两阶段生成框架,该框架基于近期人工智能驱动光谱分析的研究范式,且仅需极少假设。第一阶段,NMIRacle学习从具有计数感知的片段表示中重建分子结构,同时捕捉片段身份及其出现频次。第二阶段,光谱编码器将输入光谱(红外光谱、氢-1核磁共振谱、碳-13核磁共振谱)映射为潜在嵌入向量,用于调控预训练的生成器;该生成器经过微调后可直接实现从光谱到分子的生成。此架构将片段级化学建模与光谱证据相融合,从而产生精确的分子预测。实验结果表明,NMIRacle在分子解析任务上优于现有基线方法,并在递增的分子复杂度水平上保持稳健性能。