Automatic structure elucidation is essential for self-driving laboratories as it enables the system to achieve truly autonomous. This capability closes the experimental feedback loop, ensuring that machine learning models receive reliable structure information for real-time decision-making and optimization. Herein, we present DiSE, an end-to-end diffusion-based generative model that integrates multiple spectroscopic modalities, including MS, 13C and 1H chemical shifts, HSQC, and COSY, to achieve automated yet accurate structure elucidation of organic compounds. By learning inherent correlations among spectra through data-driven approaches, DiSE achieves superior accuracy, strong generalization across chemically diverse datasets, and robustness to experimental data despite being trained on calculated spectra. DiSE thus represents a significant advance toward fully automated structure elucidation, with broad potential in natural product research, drug discovery, and self-driving laboratories.
翻译:自动结构解析对于自驱动实验室至关重要,它使系统能够实现真正的自主性。该能力闭合了实验反馈回路,确保机器学习模型能够获取可靠的结构信息以进行实时决策与优化。本文提出DiSE,一种端到端的基于扩散的生成模型,它整合了多种光谱模态,包括质谱(MS)、13C与1H化学位移、HSQC以及COSY,以实现有机化合物自动化且精确的结构解析。通过数据驱动方法学习光谱间的内在关联,DiSE在化学多样性数据集上展现出卓越的准确性、强大的泛化能力以及对实验数据的鲁棒性,尽管其训练数据仅基于计算光谱。因此,DiSE代表了向全自动结构解析迈出的重要一步,在天然产物研究、药物发现及自驱动实验室等领域具有广泛的应用潜力。