This paper proposes a latent prompt Transformer model for solving challenging optimization problems such as molecule design, where the goal is to find molecules with optimal values of a target chemical or biological property that can be computed by an existing software. Our proposed model consists of three components. (1) A latent vector whose prior distribution is modeled by a Unet transformation of a Gaussian white noise vector. (2) A molecule generation model that generates the string-based representation of molecule conditional on the latent vector in (1). We adopt the causal Transformer model that takes the latent vector in (1) as prompt. (3) A property prediction model that predicts the value of the target property of a molecule based on a non-linear regression on the latent vector in (1). We call the proposed model the latent prompt Transformer model. After initial training of the model on existing molecules and their property values, we then gradually shift the model distribution towards the region that supports desired values of the target property for the purpose of molecule design. Our experiments show that our proposed model achieves state of the art performances on several benchmark molecule design tasks.
翻译:本文提出了一种潜在提示Transformer模型,用于解决诸如分子设计等具有挑战性的优化问题,其目标是找到能通过现有软件计算的目标化学或生物性质最优值的分子。所提模型由三个组件构成:(1) 一个潜在向量,其先验分布由高斯白噪声向量的Unet变换建模;(2) 一个分子生成模型,该模型基于(1)中潜在向量的条件生成分子的字符串表示。我们采用因果Transformer模型,将(1)中的潜在向量作为提示;(3) 一个性质预测模型,该模型通过对(1)中潜在向量进行非线性回归,预测分子目标性质的值。我们将所提模型称为潜在提示Transformer模型。在对现有分子及其性质值进行初始训练后,我们逐步将模型分布向支持目标性质期望值的区域偏移,以实现分子设计的目的。实验表明,所提模型在多个分子设计基准任务上取得了最优性能。