After the recent ground-breaking advances in protein structure prediction, one of the remaining challenges in protein machine learning is to reliably predict distributions of structural states. Parametric models of small-scale fluctuations are difficult to fit due to complex covariance structures between degrees of freedom in the protein chain, often causing models to either violate local or global structural constraints. In this paper, we present a new strategy for modelling protein densities in internal coordinates, which uses constraints in 3D space to induce covariance structure between the internal degrees of freedom. We illustrate the potential of the procedure by constructing a variational autoencoder with full covariance output induced by the constraints implied by the conditional mean in 3D, and demonstrate that our approach makes it possible to scale density models of internal coordinates to full-size proteins.
翻译:在蛋白质结构预测取得突破性进展之后,蛋白质机器学习领域尚存的挑战之一是可靠地预测结构状态的分布。由于蛋白质链自由度之间存在复杂的协方差结构,小尺度波动的参数模型难以拟合,这常常导致模型违反局部或全局结构约束。本文提出了一种内坐标中蛋白质密度建模的新策略,该策略利用三维空间中的约束来诱导内部自由度之间的协方差结构。通过构建一个具有全协方差输出的变分自编码器(其协方差由三维条件均值隐含的约束所诱导),我们展示了该方法的潜力,并证明我们的方法使得将内坐标密度模型扩展到全尺寸蛋白质成为可能。