Chiral molecule assignation is crucial for asymmetric catalysis, functional materials, and the drug industry. The conventional approach requires theoretical calculations of electronic circular dichroism (ECD) spectra, which is time-consuming and costly. To speed up this process, we have incorporated deep learning techniques for the ECD prediction. We first set up a large-scale dataset of Chiral Molecular ECD spectra (CMCDS) with calculated ECD spectra. We further develop the ECDFormer model, a Transformer-based model to learn the chiral molecular representations and predict corresponding ECD spectra with improved efficiency and accuracy. Unlike other models for spectrum prediction, our ECDFormer creatively focused on peak properties rather than the whole spectrum sequence for prediction, inspired by the scenario of chiral molecule assignation. Specifically, ECDFormer predicts the peak properties, including number, position, and symbol, then renders the ECD spectra from these peak properties, which significantly outperforms other models in ECD prediction, Our ECDFormer reduces the time of acquiring ECD spectra from 1-100 hours per molecule to 1.5s.
翻译:手性分子指认对于不对称催化、功能材料及药物工业至关重要。传统方法需通过理论计算电子圆二色光谱(ECD),过程耗时且成本高昂。为加速这一流程,我们引入深度学习技术进行ECD预测。首先构建了包含计算ECD光谱的大规模手性分子ECD光谱数据集(CMCDS)。进而提出ECDFormer模型——一种基于Transformer架构的模型,通过学习手性分子表征实现对应ECD光谱的高效精准预测。不同于其他光谱预测模型,ECDFormer创新性地聚焦于峰性质而非整个光谱序列进行预测,这一设计灵感源于手性分子指认的实际场景。具体而言,ECDFormer预测峰数量、位置和符号等峰性质,并据此渲染生成ECD光谱,在ECD预测任务中显著优于其他模型。我们的ECDFormer将每个分子获取ECD光谱的时间从1-100小时缩短至1.5秒。