Exploring molecular spaces is crucial for advancing our understanding of chemical properties and reactions, leading to groundbreaking innovations in materials science, medicine, and energy. This paper explores an approach for active learning in molecular discovery using Deep Kernel Learning (DKL), a novel approach surpassing the limits of classical Variational Autoencoders (VAEs). Employing the QM9 dataset, we contrast DKL with traditional VAEs, which analyze molecular structures based on similarity, revealing limitations due to sparse regularities in latent spaces. DKL, however, offers a more holistic perspective by correlating structure with properties, creating latent spaces that prioritize molecular functionality. This is achieved by recalculating embedding vectors iteratively, aligning with the experimental availability of target properties. The resulting latent spaces are not only better organized but also exhibit unique characteristics such as concentrated maxima representing molecular functionalities and a correlation between predictive uncertainty and error. Additionally, the formation of exclusion regions around certain compounds indicates unexplored areas with potential for groundbreaking functionalities. This study underscores DKL's potential in molecular research, offering new avenues for understanding and discovering molecular functionalities beyond classical VAE limitations.
翻译:探索分子空间对于深化我们对化学性质与反应的理解至关重要,这推动了材料科学、医学和能源领域的突破性创新。本文研究了一种将深度核学习(DKL)用于分子发现中的主动学习方法,这是一种超越经典变分自编码器(VAE)局限性的新颖方法。利用QM9数据集,我们将DKL与基于相似性分析分子结构的传统VAE进行对比,揭示出后者因潜在空间中稀疏规律性而存在的局限性。然而,DKL通过关联结构与性质,提供了一种更全面的视角,创建了优先考虑分子功能的潜在空间。这是通过迭代重新计算嵌入向量实现的,并与目标性质的实验可用性保持一致。由此产生的潜在空间不仅组织性更好,还具有独特特征,例如代表分子功能的集中最大值,以及预测不确定性与误差之间的相关性。此外,围绕某些化合物的排除区域形成表明存在具有潜在突破性功能的未探索区域。本研究强调了DKL在分子研究中的潜力,为超越经典VAE局限性的分子功能理解和发现提供了新途径。