CHA2: CHemistry Aware Convex Hull Autoencoder Towards Inverse Molecular Design

Optimizing molecular design and discovering novel chemical structures to meet certain objectives, such as quantitative estimates of the drug-likeness score (QEDs), is NP-hard due to the vast combinatorial design space of discrete molecular structures, which makes it near impossible to explore the entire search space comprehensively to exploit de novo structures with properties of interest. To address this challenge, reducing the intractable search space into a lower-dimensional latent volume helps examine molecular candidates more feasibly via inverse design. Autoencoders are suitable deep learning techniques, equipped with an encoder that reduces the discrete molecular structure into a latent space and a decoder that inverts the search space back to the molecular design. The continuous property of the latent space, which characterizes the discrete chemical structures, provides a flexible representation for inverse design in order to discover novel molecules. However, exploring this latent space requires certain insights to generate new structures. We propose using a convex hall surrounding the top molecules in terms of high QEDs to ensnare a tight subspace in the latent representation as an efficient way to reveal novel molecules with high QEDs. We demonstrate the effectiveness of our suggested method by using the QM9 as a training dataset along with the Self- Referencing Embedded Strings (SELFIES) representation to calibrate the autoencoder in order to carry out the Inverse molecular design that leads to unfold novel chemical structure.

翻译：[翻译后的摘要] 优化分子设计并发现满足特定目标（如药物相似性评分（QEDs）的定量估计）的新颖化学结构是NP难题。这是由于离散分子结构所构成的组合设计空间极其庞大，使得全面探索整个搜索空间以利用具有目标属性的全新结构几乎不可能。为应对这一挑战，将难以处理的搜索空间降维至低维潜在空间，有助于通过逆向设计更可行地评估分子候选物。自编码器是一种合适的深度学习技术，其配备的编码器能将离散分子结构压缩至潜在空间，而解码器则将搜索空间逆映射回分子设计。潜在空间的连续属性（用于表征离散化学结构）为逆向设计提供了灵活的表示方法，从而能够发现新颖分子。然而，探索这一潜在空间需要特定洞察力来生成新结构。我们提出使用凸包包围基于高QEDs的顶级分子，以在潜在表示中锁定一个紧密子空间，作为高效生成具有高QEDs的新颖分子的方法。通过采用QM9作为训练数据集，并结合自引用嵌入字符串（SELFIES）表示来校准自编码器，我们展示了所提出方法的有效性，该方法可执行逆向分子设计，进而揭示新颖的化学结构。

相关内容

自编码器

关注 141

自动编码器是一种人工神经网络，用于以无监督的方式学习有效的数据编码。自动编码器的目的是通过训练网络忽略信号“噪声”来学习一组数据的表示（编码），通常用于降维。与简化方面一起，学习了重构方面，在此，自动编码器尝试从简化编码中生成尽可能接近其原始输入的表示形式，从而得到其名称。基本模型存在几种变体，其目的是迫使学习的输入表示形式具有有用的属性。自动编码器可有效地解决许多应用问题，从面部识别到获取单词的语义。

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

33页PPT【AI+天气预测】，AI and Machine learning for weather predictions

专知会员服务

35+阅读 · 2022年3月5日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日