The ability of deep learning methods to perform classification and regression tasks relies heavily on their capacity to uncover manifolds in high-dimensional data spaces and project them into low-dimensional representation spaces. In this study, we investigate the structure and character of the manifolds generated by classical variational autoencoder (VAE) approaches and deep kernel learning (DKL). In the former case, the structure of the latent space is determined by the properties of the input data alone, while in the latter, the latent manifold forms as a result of an active learning process that balances the data distribution and target functionalities. We show that DKL with active learning can produce a more compact and smooth latent space which is more conducive to optimization compared to previously reported methods, such as the VAE. We demonstrate this behavior using a simple cards data set and extend it to the optimization of domain-generated trajectories in physical systems. Our findings suggest that latent manifolds constructed through active learning have a more beneficial structure for optimization problems, especially in feature-rich target-poor scenarios that are common in domain sciences, such as materials synthesis, energy storage, and molecular discovery. The jupyter notebooks that encapsulate the complete analysis accompany the article.
翻译:深度学习方法的分类与回归能力高度依赖于其在高维数据空间中揭示流形并将其投影至低维表示空间的能力。本研究深入分析了经典变分自编码器(VAE)方法与深度核学习(DKL)所生成流形的结构与特性。在前者中,潜在空间的结构仅由输入数据的属性决定;而在后者中,潜在流形是通过主动学习过程形成的,该过程平衡了数据分布与目标功能。研究表明,与先前报道的VAE等方法相比,采用主动学习的DKL能够生成更紧凑、更平滑的潜在空间,从而更有利于优化。我们通过简单的纸牌数据集验证了这一特性,并将其推广至物理系统中领域生成轨迹的优化。研究结果表明,通过主动学习构建的潜在流形对优化问题具有更优的结构特性,尤其适用于在领域科学(如材料合成、能源存储和分子发现)中常见的富特征-少目标场景。伴随本文提供了包含完整分析的Jupyter Notebook文件。